Skip to main content

Accelerating 3D Protein Structure Similarity Searching on Microsoft Azure Cloud with Local Replicas of Macromolecular Data

  • Conference paper
  • First Online:
Book cover Parallel Processing and Applied Mathematics

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9574))

Abstract

Searching similarities among 3D protein structures deposited in macromolecular data repositories, like Protein Data Bank, is one of the time-consuming processes performed in structural bioinformatics. When performed in one-to-many or many-to-many model, the process requires increased computational resources. Moreover, exponential growth of protein structures in the Protein Data Bank causes the necessity to prepare computer systems to be able to deal with such huge volumes of data. Cloud computing provides both, theoretically infinite computational resources and a great possibility of scaling systems out and up. In this paper, we show how 3D protein structure similarity searching can be scaled out on Microsoft Azure cloud and performed by a loosely coupled, many-task computing system with local replicas of macromolecular data.

This project was supported by Microsoft Research in USA within Microsoft Azure for Research Award granted for the Cloud4Psi project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Berman, H., et al.: The protein data bank. Nucleic Acids Res. 28, 235–242 (2000)

    Article  Google Scholar 

  2. Gibrat, J., Madej, T., Bryant, S.: Surprising similarities in structure comparison. Curr. Opin. Struct. Biol. 6(3), 377–385 (1996)

    Article  Google Scholar 

  3. Gu, J., Bourne, P.: Structural Bioinformatics (Methods of Biochemical Analysis), 2nd edn. Wiley-Blackwell, Hoboken (2009)

    Google Scholar 

  4. Hazelhurst, S.: PH2: an Hadoop-based framework for mining structural properties from the PDB database. In: Proceedings of the 2010 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists, pp. 104–112 (2010)

    Google Scholar 

  5. Holm, L., Kaariainen, S., Rosenstrom, P., Schenkel, A.: Searching protein structure databases with DaliLite v. 3. Bioinformatics 24, 2780–2781 (2008)

    Article  Google Scholar 

  6. Hung, C.L., Lin, Y.L.: Implementation of a parallel protein structure alignment service on cloud. Int. J. Genomics 439681, 1–8 (2013)

    Google Scholar 

  7. Mayans, O., van der Ven, P., Wilm, M., Mues, A., Young, P., Wilmanns, M., Gautel, M.: Structural basis for activation of the titin kinase domain during myofibrillogenesis. Nature 395(6705), 863–869 (1998)

    Article  Google Scholar 

  8. Mell, P., Grance, T.: The NIST definition of Cloud Computing. Special Publication, pp. 800–145 (2015). http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf (Accessed 7th May 2015)

  9. Minami, S., Sawada, K., Chikenji, G.: MICAN: a protein structure alignment algorithm that can handle multiple-chains, inverse alignments, Ca only models, alternative alignments, and non-sequential alignments. BMC Bioinform. 14(24), 1–22 (2013)

    Google Scholar 

  10. Mrozek, D.: High-Performance Computational Solutions in Protein Bioinformatics. Springer, Heidelberg (2014)

    Book  Google Scholar 

  11. Mrozek, D., Małysiak-Mrozek, B.: CASSERT: a two-phase alignment algorithm for matching 3D structures of proteins. In: Kwiecień, A., Gaj, P., Stera, P. (eds.) CN 2013. CCIS, vol. 370, pp. 334–343. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  12. Mrozek, D., Małysiak-Mrozek, B., Kłapciński, A.: Cloud4Psi: cloud computing for 3D protein structure similarity searching. Bioinformatics 30(19), 2822–2825 (2014)

    Article  Google Scholar 

  13. Mrozek, D., Brożek, M., Małysiak-Mrozek, B.: Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA. J. Mol. Model. 20(2), 2067 (2014). http://dx.doi.org/10.1007/s00894-014-2067-1

    Article  Google Scholar 

  14. Mrozek, D., Gosk, P., Małysiak-Mrozek, B.: Scaling Ab initio predictions of 3D protein structures in Microsoft Azure cloud. J. Grid Comput. 13(4), 561–585 (2015). http://dx.doi.org/10.1007/s10723-015-9353-8

    Article  Google Scholar 

  15. Poteralski, A.: Optimization of mechanical structures using artificial immune algorithm. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B. (eds.) BDAS 2014. CCIS, vol. 424, pp. 280–289. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  16. Poteralski, A., Szczepanik, M., Ptaszny, J., Kuś, W., Burczyński, T.: Hybrid artificial immune system in identification of room acoustic properties. Inverse Prob. Sci. Eng. 21(6), 957–967 (2013)

    Article  Google Scholar 

  17. Prlić, A., Bliven, S., Rose, P., et al.: Pre-calculated protein structure alignments at the RCSB PDB website. Bioinformatics 26, 2983–2985 (2010)

    Article  Google Scholar 

  18. Prlić, A., Yates, A., Bliven, S., et al.: BioJava: an open-source framework for bioinformatics in 2012. Bioinformatics 28, 2693–2695 (2012)

    Article  Google Scholar 

  19. Shapiro, J., Brutlag, D.: FoldMiner and LOCK2: protein structure comparison and motif discovery on the web. Nucleic Acids Res. 32, 536–541 (2004)

    Article  Google Scholar 

  20. Shindyalov, I., Bourne, P.: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 11(9), 739–747 (1998)

    Article  Google Scholar 

  21. Virtual Machine and Cloud Service Sizes for Azure (2015). https://msdn.microsoft.com/library/azure/dn197896.aspx (Accessed 7th May 2015)

  22. Ye, Y., Godzik, A.: Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics 19(2), 246–255 (2003)

    Google Scholar 

  23. Zhang, Y., Skolnick, J.: TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33(7), 2302–2309 (2005)

    Article  Google Scholar 

  24. Zhu, J., Weng, Z.: FAST: a novel protein structure alignment algorithm. Proteins 58, 618–627 (2005)

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank Microsoft Research for providing us with free access to the computational resources of the Microsoft Azure cloud under the Microsoft Azure for Research Award program. Further development of the system will be carried out by the Cloud4Proteins non-profit, scientific group (http://www.zti.aei.polsl.pl/w3/dmrozek/science/cloud4proteins.htm).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dariusz Mrozek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Mrozek, D., Kutyła, T., Małysiak-Mrozek, B. (2016). Accelerating 3D Protein Structure Similarity Searching on Microsoft Azure Cloud with Local Replicas of Macromolecular Data. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds) Parallel Processing and Applied Mathematics. Lecture Notes in Computer Science(), vol 9574. Springer, Cham. https://doi.org/10.1007/978-3-319-32152-3_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32152-3_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32151-6

  • Online ISBN: 978-3-319-32152-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics