Abstract
Searching similarities among 3D protein structures deposited in macromolecular data repositories, like Protein Data Bank, is one of the time-consuming processes performed in structural bioinformatics. When performed in one-to-many or many-to-many model, the process requires increased computational resources. Moreover, exponential growth of protein structures in the Protein Data Bank causes the necessity to prepare computer systems to be able to deal with such huge volumes of data. Cloud computing provides both, theoretically infinite computational resources and a great possibility of scaling systems out and up. In this paper, we show how 3D protein structure similarity searching can be scaled out on Microsoft Azure cloud and performed by a loosely coupled, many-task computing system with local replicas of macromolecular data.
This project was supported by Microsoft Research in USA within Microsoft Azure for Research Award granted for the Cloud4Psi project.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Berman, H., et al.: The protein data bank. Nucleic Acids Res. 28, 235–242 (2000)
Gibrat, J., Madej, T., Bryant, S.: Surprising similarities in structure comparison. Curr. Opin. Struct. Biol. 6(3), 377–385 (1996)
Gu, J., Bourne, P.: Structural Bioinformatics (Methods of Biochemical Analysis), 2nd edn. Wiley-Blackwell, Hoboken (2009)
Hazelhurst, S.: PH2: an Hadoop-based framework for mining structural properties from the PDB database. In: Proceedings of the 2010 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists, pp. 104–112 (2010)
Holm, L., Kaariainen, S., Rosenstrom, P., Schenkel, A.: Searching protein structure databases with DaliLite v. 3. Bioinformatics 24, 2780–2781 (2008)
Hung, C.L., Lin, Y.L.: Implementation of a parallel protein structure alignment service on cloud. Int. J. Genomics 439681, 1–8 (2013)
Mayans, O., van der Ven, P., Wilm, M., Mues, A., Young, P., Wilmanns, M., Gautel, M.: Structural basis for activation of the titin kinase domain during myofibrillogenesis. Nature 395(6705), 863–869 (1998)
Mell, P., Grance, T.: The NIST definition of Cloud Computing. Special Publication, pp. 800–145 (2015). http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf (Accessed 7th May 2015)
Minami, S., Sawada, K., Chikenji, G.: MICAN: a protein structure alignment algorithm that can handle multiple-chains, inverse alignments, Ca only models, alternative alignments, and non-sequential alignments. BMC Bioinform. 14(24), 1–22 (2013)
Mrozek, D.: High-Performance Computational Solutions in Protein Bioinformatics. Springer, Heidelberg (2014)
Mrozek, D., Małysiak-Mrozek, B.: CASSERT: a two-phase alignment algorithm for matching 3D structures of proteins. In: Kwiecień, A., Gaj, P., Stera, P. (eds.) CN 2013. CCIS, vol. 370, pp. 334–343. Springer, Heidelberg (2013)
Mrozek, D., Małysiak-Mrozek, B., Kłapciński, A.: Cloud4Psi: cloud computing for 3D protein structure similarity searching. Bioinformatics 30(19), 2822–2825 (2014)
Mrozek, D., Brożek, M., Małysiak-Mrozek, B.: Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA. J. Mol. Model. 20(2), 2067 (2014). http://dx.doi.org/10.1007/s00894-014-2067-1
Mrozek, D., Gosk, P., Małysiak-Mrozek, B.: Scaling Ab initio predictions of 3D protein structures in Microsoft Azure cloud. J. Grid Comput. 13(4), 561–585 (2015). http://dx.doi.org/10.1007/s10723-015-9353-8
Poteralski, A.: Optimization of mechanical structures using artificial immune algorithm. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B. (eds.) BDAS 2014. CCIS, vol. 424, pp. 280–289. Springer, Heidelberg (2014)
Poteralski, A., Szczepanik, M., Ptaszny, J., Kuś, W., Burczyński, T.: Hybrid artificial immune system in identification of room acoustic properties. Inverse Prob. Sci. Eng. 21(6), 957–967 (2013)
Prlić, A., Bliven, S., Rose, P., et al.: Pre-calculated protein structure alignments at the RCSB PDB website. Bioinformatics 26, 2983–2985 (2010)
Prlić, A., Yates, A., Bliven, S., et al.: BioJava: an open-source framework for bioinformatics in 2012. Bioinformatics 28, 2693–2695 (2012)
Shapiro, J., Brutlag, D.: FoldMiner and LOCK2: protein structure comparison and motif discovery on the web. Nucleic Acids Res. 32, 536–541 (2004)
Shindyalov, I., Bourne, P.: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 11(9), 739–747 (1998)
Virtual Machine and Cloud Service Sizes for Azure (2015). https://msdn.microsoft.com/library/azure/dn197896.aspx (Accessed 7th May 2015)
Ye, Y., Godzik, A.: Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics 19(2), 246–255 (2003)
Zhang, Y., Skolnick, J.: TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33(7), 2302–2309 (2005)
Zhu, J., Weng, Z.: FAST: a novel protein structure alignment algorithm. Proteins 58, 618–627 (2005)
Acknowledgements
We would like to thank Microsoft Research for providing us with free access to the computational resources of the Microsoft Azure cloud under the Microsoft Azure for Research Award program. Further development of the system will be carried out by the Cloud4Proteins non-profit, scientific group (http://www.zti.aei.polsl.pl/w3/dmrozek/science/cloud4proteins.htm).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Mrozek, D., Kutyła, T., Małysiak-Mrozek, B. (2016). Accelerating 3D Protein Structure Similarity Searching on Microsoft Azure Cloud with Local Replicas of Macromolecular Data. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds) Parallel Processing and Applied Mathematics. Lecture Notes in Computer Science(), vol 9574. Springer, Cham. https://doi.org/10.1007/978-3-319-32152-3_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-32152-3_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32151-6
Online ISBN: 978-3-319-32152-3
eBook Packages: Computer ScienceComputer Science (R0)