Abstract
Sequence analysis has become essential to the study of genomes and biological research in general. Basic Local Alignment Search Tool (BLAST) leads the way as the most accepted method for performing necessary query searches and analysis of discovered genes. Combating growing data sizes, with the goal of speeding up job runtimes, scientist are resorting to grid computing technologies. However, grid environments are characterized by dynamic, heterogeneous, and transient state of available resources causing major hindrance to users when trying to realize user-desired levels of service. This paper analyzes performance characteristics of NCBI BLAST on several resources and captures influence of resource characteristics and job parameters on BLAST job runtime across those resources. Obtained results are summarized as a set of principles characterizing performance of NCBI BLAST across homogeneous and heterogeneous environments. These principles are then applied and verified through creation of a grid-enabled BLAST wrapper application called Dynamic BLAST. Results show runtime savings up to 50% and resource utilization improvement of approximately 40%.
Similar content being viewed by others
References
Bergeron, B.: Bioinformatics Computing, 1st edn. Prentice Hall, Upper Saddle River (2002)
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. Mol. Biol. 215(3), 403–410 (1990)
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)
Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc. Nat. Acad. Sci. USA 85(16), 2444–2448 (1988)
Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Program, H.G.: What is the human genome project? December 07 (2005). Available at http://www.ornl.gov/sci/techresources/Human_Genome/project/about.shtml. Retrieved: April 28, 2008
Xue Wu, C.-W.T.: Searching sequence databases using high-performance BLASTs. In: Albert, Y.Z. (ed.) Parallel Computing for Bioinformatics and Computational Biology, pp. 211–232. Wiley, New York (2006)
Darling, A.E., Carey, L., Feng, W.-C.: The design, implementation, and evaluation of mpiBLAST. In: ClusterWorld Conference & Expo in conjunction with the 4th International Conference on Linux Clusters: The HPC Revolution 2003, San Jose, CA (2003)
Bjomson, R.D., Sherman, A.H., Weston, S.B., Willard, N., Wing, J.: TurboBLAST: a parallel implementation of BLAST built on the TurboHub. In: International Parallel and Distributed Processing Symposium: IPDPS 2002, Ft. Lauderdale, FL (2002)
Krishnan, A.: GridBLAST: a Globus-based high-throughput implementation of BLAST in a Grid computing framework. Concurr. Comput., Pract. Experience 17(13), 1607–1623 (2005)
Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid information services for distributed resource sharing. In: 10th IEEE Symp. on High Performance Distributed Computing (HPDC), Los Alamitos, CA, pp. 181–195 (2001)
NCBI: BLAST basic local alignment search tool, April 25, 2008. Available at http://blast.ncbi.nlm.nih.gov/Blast.cgi. Retrieved: April 28, 2008
NCBI: BLAST frequently asked questions (2008). Available at http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Web&PAGE_TYPE=BlastFAQs#sigxcpu. Retrieved: April 28, 2008
Afgan, E., Bangalore, P.: Dynamic BLAST—a grid enabled BLAST. Int. J. Comput. Sci. Netw. Secur. 9(4), 149–157 (2009)
Sulakhe, D., Rodriguez, A., D’Souza, M., Wilde, M., Nefedova, V., Foster, I., Maltsev, N.: GNARE: an environment for grid-based high-throughput genome analysis. In: Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid’05), pp. 455–462. Cardiff, UK (2005)
Gardner, M.K., Feng, W.-C., Archuleta, J., Lin, H., Ma, X.: Parallel genomic sequence-searching on an ad-hoc grid: experiences, lessons learned, and implications. In: Supercomputing, 2006 (SC ’06), pp. 22–36. Tampa, FL (2006)
Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure, 1st edn. Morgan Kaufmann, San Francisco (1998)
Afgan, E., Purushotham, B.: Embarrassingly parallel jobs are not embarrassingly easy to schedule on the grid. In: International Conference for High Performance, Networking, Storage and Analysis (SC08)—Workshop on Many-Task Computing on Grids and Supercomputers, p. 10. Austin, TX (2008)
Barton, G.J.: SCANPS version 2.3.9 User Guide. University of Dundee, Scotland (2002)
Afgan, E., Bangalore, P.: Performance characterization of BLAST for the grid. In: 7th IEEE International Conference on Bioinformatics and Bioengineering (BIBE 2007), pp. 1394–1398. Boston, MA (2007)
Tan, G., Xu, L., Dai, Z., Feng, S., Sun, N.: A study of architectural optimization methods in bioinformatics applications. Int. J. High Perform. Comput. Appl. 21(3), 371–384 (2007)
Sanchez, F., Salami, E., Ramirez, A., Valero, M.: Performance analysis of sequence alignment applications. In: 2006 IEEE International Symposium on Workload Characterization, pp. 51–60. San Jose, CA (2006)
Standard Performance Evaluation Corporation, March 10, 2009. Available at http://www.spec.org/. Retrieved: March 19, 2009
Bader, D.A., Li, Y., Li, T., Sachdeva, V.: BioPerf: A benchmark suite to evaluate high-performance computer architecture on bioinformatics applications. In: The IEEE International Symposium on Workload Characterization (IISWC 2005), pp. 163–173. Austin, TX (2005)
Globus: The Globus Resource Specification Language RSL v1.0 (2009). Available at http://www-unix.globus.org/api/c-globus-2.4/globus_gram_documentation/html/. Retrieved: April 2, 2009
Wang, C., Lefkowitz, E.J.: SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters. BMC Bioinformatics 5(171) (2004)
Dwan, C.: Bioinformatics Benchmarks on the Dual Core Intel Xeon Processor. The BioTeam, Inc., Cambridge (2006)
Sodhi, S., Subhlok, J.: Automatic construction and evaluation of performance skeletons. In: 19th International Parallel and Distributed Processing Symposium (IPDPS ’05), p. 10. Denver, CO (2005)
Nadeem, F., Prodan, R., Fahringer, T., Iosup, A.: Benchmarking grid applications for performance and scalability predictions. In: CoreGRID 2007 Workshop on Middleware, p. 14. Dresden, Germany (2007)
Tirado-Ramos, A., Tsouloupas, G., Dikaiakos, M., Sloot, P.: Grid resource selection by application benchmarking for computational haemodynamics applications. In: International Conference on Computational Science (ICCS) 2005, pp. 534–543. Kassel, Germany (2005)
Afgan, E., Bangalore, P.: Experiences with developing and deploying dynamic BLAST. In: 15th ACM Mardi Gras Conference, Workshop on Grid-Enabling Applications, pp. 38–48. Baton Rouge, LA (2008)
Foster, I., Kesselman, C., Tsudik, G., Tuecke, S.: A security architecture for computational grids. In: 5th ACM Conference on Computer and Communication Security Conference, pp. 83–92. San Francisco, CA (1998)
Afgan, E., Bangalore, P., Duncan, D.: GridAtlas – a grid application and resource configuration repository and discovery service. In: International Conference on Cluster Computing, New Orleans, LA, pp. 1–10, Aug 31–Sep 4, 2009
Rajic, H., Brobst, R., Chan, W., Ferstl, F., Gardiner, J., Haas, A., Nitzberg, B., Tollefsrud, J.: Distributed resource management application API (DRMAA) specification 1.0 GFD-R-P.022. Global Grid Forum (GGF) (2004)
Foster, I., Kesselman, C.: The Globus toolkit. In: Foster, I., Kesselman, C. (eds.) The Grid: Blueprint for a New Computing Infrastructure, pp. 259–278. Morgan Kaufmann, San Francisco (1999)
Afgan, E., Bangalore, P., Mukkai, S., Yammanuru, S.: Design and implementation of a readily available historical application performance database (AppDB) for Grid. University of Alabama at Birmingham (UAB), Birmingham, AL UABCIS-TR-2008-0506-1, 6 May 2008
Dale, N., Teague, D.: C++ Plus Data Structures, 2nd edn. Jones & Bartlett, Boston (2001)
Leung, J.Y.-T. (ed.): Handbook of Scheduling: Algorithms, Models, and Performance Analysis, 1st edn., vol. 1. CRC, Boca Raton (2004)
GridWay: Job Template options. Feb 16, 2009. Available at http://www.gridway.org/documentation/stable5.4/user/gridway-user-functionality.html#id2578278. Retrieved: April 2, 2009
Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J., Brandic, I.: Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Gener. Comput. Syst. 25(6), 599–616 (2009)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Afgan, E., Bangalore, P. Exploiting performance characterization of BLAST in the grid. Cluster Comput 13, 385–395 (2010). https://doi.org/10.1007/s10586-010-0121-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-010-0121-z