Abstract
Data analysis processes in scientific applications can be expressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Performance optimization of these workflows can be viewed as a search for a set of optimal values in a multidimensional parameter space consisting of input performance parameters to the applications that are known to affect their execution times. While some performance parameters such as grouping of workflow components and their mapping to machines do not affect the accuracy of the analysis, others may dictate trading the output quality of individual components (and of the whole workflow) for performance. This paper describes an integrated framework which is capable of supporting performance optimizations along multiple such parameters. Using two real-world applications in the spatial, multidimensional data analysis domain, we present an experimental evaluation of the proposed framework.
Similar content being viewed by others
References
Acher, M., Collet, P., Lahire, P.: Issues in managing variability of medical imaging grid services. In: MICCAI-Grid Workshop (MICCAI-Grid) (2008)
Beynon, M.D., Kurc, T., Catalyurek, U., Chang, C., Sussman, A., Saltz, J.: Distributed processing of very large datasets with DataCutter. Parallel Comput. 27(11), 1457–1478 (2001)
Brandic, I., Pllana, S., Benkner, S.: Specification, planning, and execution of QoS-aware Grid workflows within the Amadeus environment. Concurr. Comput. Pract. Exp. 20(4), 331–345 (2008)
Chang, F., Karamcheti, V.: Automatic configuration and run-time adaptation of distributed applications. In: High Performance Distributed Computing, pp. 11–20 (2000)
Chen, C., Chame, J., Hall, M.W.: Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. In: International Symposium on Code Generation and Optimization (2005)
Chiu, D., Deshpande, S., Agrawal, G., Li, R.: Cost and accuracy sensitive dynamic workflow composition over grid environments. In: 9th IEEE/ACM International Conference on Grid Computing, pp. 9–16 (2008)
Chow, S.K., Hakozaki, H., Price, D.L., MacLean, N.A.B., Deerinck, T.J., Bouwer, J.C., Martone, M.E., Peltier, S.T., Ellisman, M.H.: Automated microscopy system for mosaic acquisition and processing. J. Microsc. 222(2), 76–84 (2006)
Chung, I.H., Hollingsworth, J.: A case study using automatic performance tuning for large-scale scientific programs. In: 15th IEEE International Symposium on High Performance Distributed Computing, pp. 45–56 (2006)
Chung, I.H., Hollingsworth, J.K.: Using information from prior runs to improve automated tuning systems. In: SC ’04: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, p. 30. IEEE Computer Society, Washington (2004)
Cortellessa, V., Marinelli, F., Potena, P.: Automated selection of software components based on cost/reliability tradeoff. In: Software Architecture, Third European Workshop, EWSA 2006. Lecture Notes in Computer Science, vol. 4344. Springer, Berlin (2006)
Deelman, E., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Patil, S., Su, M.H., Vahi, K., Livny, M.: Pegasus: Mapping scientific workflows onto the grid. In: Lecture Notes in Computer Science: Grid Computing, pp. 11–20 (2004)
Gil, Y., Ratnakar, V., Deelman, E., Mehta, G., Kim, J.: Wings for Pegasus: Creating large-scale scientific applications using semantic representations of computational workflows. In: Proceedings of the 19th Annual Conference on Innovative Applications of Artificial Intelligence (IAAI) (2007)
Glatard, T., Montagnat, J., Pennec, X.: Efficient services composition for grid-enabled data-intensive applications. In: Proceedings of the IEEE International Symposium on High Performance Distributed Computing (HPDC’06), Paris, France, 19 June 2006
Kong, J., Sertel, O., Shimada, H., Boyer, K., Saltz, J., Gurcan, M.: Computer-aided grading of neuroblastic differentiation: Multi-resolution and multi-classifier approach. In: IEEE International Conference on Image Processing, ICIP 2007, vol. 5, pp. 525–528 (2007)
Kumar, V., Rutt, B., Kurc, T., Catalyurek, U., Pan, T., Chow, S., Lamont, S., Martone, M., Saltz, J.: Large-scale biomedical image analysis in grid environments. IEEE Trans. Inf. Technol. Biomed. 12(2), 154–161 (2008)
Kumar, V.S., Rutt, B., Kurc, T., Catalyurek, U., Saltz, J., Chow, S., Lamont, S., Martone, M.: Large image correction and warping in a cluster environment. In: SC ’06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 79. ACM, New York (2006)
Kumar, V.S., Narayanan, S., Kurç, T.M., Kong, J., Gurcan, M.N., Saltz, J.H.: Analysis and semantic querying in large biomedical image datasets. IEEE Comput. 41(4), 52–59 (2008)
Lera, I., Juiz, C., Puigjaner, R.: Performance-related ontologies and semantic web applications for on-line performance assessment intelligent systems. Sci. Comput. Program. 61(1), 27–37 (2006)
Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E.A., Tao, J., Zhao, Y.: Scientific workflow management and the Kepler system: Research articles. Concurr. Comput. Pract. Exp. 18(10), 1039–1065 (2006)
Nelson, Y.L.: Model-guided performance tuning for application-level parameters. Ph.D. Dissertation, University of Southern California (2009)
Norris, B., Ray, J., Armstrong, R., Mcinnes, L.C., Shende, S.: Computational quality of service for scientific components. In: Proceedings of the International Symposium on Component-based Software Engineering (CBSE7), pp. 264–271. Springer, Berlin (2004)
Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T., Glover, K., Pocock, M.R., Wipat, A., Li, P.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)
Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: the Condor experience: Research articles. Concurr. Comput. Pract. Exp. 17(2–4), 323–356 (2005)
Truong, H.L., Dustdar, S., Fahringer, T.: Performance metrics and ontologies for grid workflows. Future Gener. Comput. Syst. 23(6), 760–772 (2007)
Wolski, R., Spring, N., Hayes, J.: The network weather service: a distributed resource performance forecasting service for metacomputing. J. Future Gener. Comput. Syst. 15, 757–768 (1999)
Zhou, J., Cooper, K., Yen, I.L.: A rule-based component customization technique for QoS properties. In: Eighth IEEE International Symposium on High Assurance Systems Engineering, pp. 302–303 (2004)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kumar, V.S., Kurc, T., Ratnakar, V. et al. Parameterized specification, configuration and execution of data-intensive scientific workflows. Cluster Comput 13, 315–333 (2010). https://doi.org/10.1007/s10586-010-0133-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-010-0133-8