Skip to main content
Log in

Exploiting performance characterization of BLAST in the grid

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Sequence analysis has become essential to the study of genomes and biological research in general. Basic Local Alignment Search Tool (BLAST) leads the way as the most accepted method for performing necessary query searches and analysis of discovered genes. Combating growing data sizes, with the goal of speeding up job runtimes, scientist are resorting to grid computing technologies. However, grid environments are characterized by dynamic, heterogeneous, and transient state of available resources causing major hindrance to users when trying to realize user-desired levels of service. This paper analyzes performance characteristics of NCBI BLAST on several resources and captures influence of resource characteristics and job parameters on BLAST job runtime across those resources. Obtained results are summarized as a set of principles characterizing performance of NCBI BLAST across homogeneous and heterogeneous environments. These principles are then applied and verified through creation of a grid-enabled BLAST wrapper application called Dynamic BLAST. Results show runtime savings up to 50% and resource utilization improvement of approximately 40%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bergeron, B.: Bioinformatics Computing, 1st edn. Prentice Hall, Upper Saddle River (2002)

    Google Scholar 

  2. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. Mol. Biol. 215(3), 403–410 (1990)

    Google Scholar 

  3. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)

    Article  Google Scholar 

  4. Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc. Nat. Acad. Sci. USA 85(16), 2444–2448 (1988)

    Article  Google Scholar 

  5. Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  Google Scholar 

  6. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)

    Book  MATH  Google Scholar 

  7. Program, H.G.: What is the human genome project? December 07 (2005). Available at http://www.ornl.gov/sci/techresources/Human_Genome/project/about.shtml. Retrieved: April 28, 2008

  8. Xue Wu, C.-W.T.: Searching sequence databases using high-performance BLASTs. In: Albert, Y.Z. (ed.) Parallel Computing for Bioinformatics and Computational Biology, pp. 211–232. Wiley, New York (2006)

    Google Scholar 

  9. Darling, A.E., Carey, L., Feng, W.-C.: The design, implementation, and evaluation of mpiBLAST. In: ClusterWorld Conference & Expo in conjunction with the 4th International Conference on Linux Clusters: The HPC Revolution 2003, San Jose, CA (2003)

  10. Bjomson, R.D., Sherman, A.H., Weston, S.B., Willard, N., Wing, J.: TurboBLAST: a parallel implementation of BLAST built on the TurboHub. In: International Parallel and Distributed Processing Symposium: IPDPS 2002, Ft. Lauderdale, FL (2002)

  11. Krishnan, A.: GridBLAST: a Globus-based high-throughput implementation of BLAST in a Grid computing framework. Concurr. Comput., Pract. Experience 17(13), 1607–1623 (2005)

    Article  Google Scholar 

  12. Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid information services for distributed resource sharing. In: 10th IEEE Symp. on High Performance Distributed Computing (HPDC), Los Alamitos, CA, pp. 181–195 (2001)

  13. NCBI: BLAST basic local alignment search tool, April 25, 2008. Available at http://blast.ncbi.nlm.nih.gov/Blast.cgi. Retrieved: April 28, 2008

  14. NCBI: BLAST frequently asked questions (2008). Available at http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastFAQs#sigxcpu. Retrieved: April 28, 2008

  15. Afgan, E., Bangalore, P.: Dynamic BLAST—a grid enabled BLAST. Int. J. Comput. Sci. Netw. Secur. 9(4), 149–157 (2009)

    Google Scholar 

  16. Sulakhe, D., Rodriguez, A., D’Souza, M., Wilde, M., Nefedova, V., Foster, I., Maltsev, N.: GNARE: an environment for grid-based high-throughput genome analysis. In: Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid’05), pp. 455–462. Cardiff, UK (2005)

  17. Gardner, M.K., Feng, W.-C., Archuleta, J., Lin, H., Ma, X.: Parallel genomic sequence-searching on an ad-hoc grid: experiences, lessons learned, and implications. In: Supercomputing, 2006 (SC ’06), pp. 22–36. Tampa, FL (2006)

  18. Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure, 1st edn. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  19. Afgan, E., Purushotham, B.: Embarrassingly parallel jobs are not embarrassingly easy to schedule on the grid. In: International Conference for High Performance, Networking, Storage and Analysis (SC08)—Workshop on Many-Task Computing on Grids and Supercomputers, p. 10. Austin, TX (2008)

  20. Barton, G.J.: SCANPS version 2.3.9 User Guide. University of Dundee, Scotland (2002)

  21. Afgan, E., Bangalore, P.: Performance characterization of BLAST for the grid. In: 7th IEEE International Conference on Bioinformatics and Bioengineering (BIBE 2007), pp. 1394–1398. Boston, MA (2007)

  22. Tan, G., Xu, L., Dai, Z., Feng, S., Sun, N.: A study of architectural optimization methods in bioinformatics applications. Int. J. High Perform. Comput. Appl. 21(3), 371–384 (2007)

    Article  Google Scholar 

  23. Sanchez, F., Salami, E., Ramirez, A., Valero, M.: Performance analysis of sequence alignment applications. In: 2006 IEEE International Symposium on Workload Characterization, pp. 51–60. San Jose, CA (2006)

  24. Standard Performance Evaluation Corporation, March 10, 2009. Available at http://www.spec.org/. Retrieved: March 19, 2009

  25. Bader, D.A., Li, Y., Li, T., Sachdeva, V.: BioPerf: A benchmark suite to evaluate high-performance computer architecture on bioinformatics applications. In: The IEEE International Symposium on Workload Characterization (IISWC 2005), pp. 163–173. Austin, TX (2005)

  26. Globus: The Globus Resource Specification Language RSL v1.0 (2009). Available at http://www-unix.globus.org/api/c-globus-2.4/globus_gram_documentation/html/. Retrieved: April 2, 2009

  27. Wang, C., Lefkowitz, E.J.: SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters. BMC Bioinformatics 5(171) (2004)

  28. Dwan, C.: Bioinformatics Benchmarks on the Dual Core Intel Xeon Processor. The BioTeam, Inc., Cambridge (2006)

    Google Scholar 

  29. Sodhi, S., Subhlok, J.: Automatic construction and evaluation of performance skeletons. In: 19th International Parallel and Distributed Processing Symposium (IPDPS ’05), p. 10. Denver, CO (2005)

  30. Nadeem, F., Prodan, R., Fahringer, T., Iosup, A.: Benchmarking grid applications for performance and scalability predictions. In: CoreGRID 2007 Workshop on Middleware, p. 14. Dresden, Germany (2007)

  31. Tirado-Ramos, A., Tsouloupas, G., Dikaiakos, M., Sloot, P.: Grid resource selection by application benchmarking for computational haemodynamics applications. In: International Conference on Computational Science (ICCS) 2005, pp. 534–543. Kassel, Germany (2005)

  32. Afgan, E., Bangalore, P.: Experiences with developing and deploying dynamic BLAST. In: 15th ACM Mardi Gras Conference, Workshop on Grid-Enabling Applications, pp. 38–48. Baton Rouge, LA (2008)

  33. Foster, I., Kesselman, C., Tsudik, G., Tuecke, S.: A security architecture for computational grids. In: 5th ACM Conference on Computer and Communication Security Conference, pp. 83–92. San Francisco, CA (1998)

  34. Afgan, E., Bangalore, P., Duncan, D.: GridAtlas – a grid application and resource configuration repository and discovery service. In: International Conference on Cluster Computing, New Orleans, LA, pp. 1–10, Aug 31–Sep 4, 2009

  35. Rajic, H., Brobst, R., Chan, W., Ferstl, F., Gardiner, J., Haas, A., Nitzberg, B., Tollefsrud, J.: Distributed resource management application API (DRMAA) specification 1.0 GFD-R-P.022. Global Grid Forum (GGF) (2004)

  36. Foster, I., Kesselman, C.: The Globus toolkit. In: Foster, I., Kesselman, C. (eds.) The Grid: Blueprint for a New Computing Infrastructure, pp. 259–278. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  37. Afgan, E., Bangalore, P., Mukkai, S., Yammanuru, S.: Design and implementation of a readily available historical application performance database (AppDB) for Grid. University of Alabama at Birmingham (UAB), Birmingham, AL UABCIS-TR-2008-0506-1, 6 May 2008

  38. Dale, N., Teague, D.: C++ Plus Data Structures, 2nd edn. Jones & Bartlett, Boston (2001)

    Google Scholar 

  39. Leung, J.Y.-T. (ed.): Handbook of Scheduling: Algorithms, Models, and Performance Analysis, 1st edn., vol. 1. CRC, Boca Raton (2004)

    MATH  Google Scholar 

  40. GridWay: Job Template options. Feb 16, 2009. Available at http://www.gridway.org/documentation/stable5.4/user/gridway-user-functionality.html#id2578278. Retrieved: April 2, 2009

  41. Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J., Brandic, I.: Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Gener. Comput. Syst. 25(6), 599–616 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Purushotham Bangalore.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Afgan, E., Bangalore, P. Exploiting performance characterization of BLAST in the grid. Cluster Comput 13, 385–395 (2010). https://doi.org/10.1007/s10586-010-0121-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-010-0121-z

Keywords

Navigation