Skip to main content
Log in

Parameterized specification, configuration and execution of data-intensive scientific workflows

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Data analysis processes in scientific applications can be expressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Performance optimization of these workflows can be viewed as a search for a set of optimal values in a multidimensional parameter space consisting of input performance parameters to the applications that are known to affect their execution times. While some performance parameters such as grouping of workflow components and their mapping to machines do not affect the accuracy of the analysis, others may dictate trading the output quality of individual components (and of the whole workflow) for performance. This paper describes an integrated framework which is capable of supporting performance optimizations along multiple such parameters. Using two real-world applications in the spatial, multidimensional data analysis domain, we present an experimental evaluation of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Acher, M., Collet, P., Lahire, P.: Issues in managing variability of medical imaging grid services. In: MICCAI-Grid Workshop (MICCAI-Grid) (2008)

  2. Beynon, M.D., Kurc, T., Catalyurek, U., Chang, C., Sussman, A., Saltz, J.: Distributed processing of very large datasets with DataCutter. Parallel Comput. 27(11), 1457–1478 (2001)

    Article  MATH  Google Scholar 

  3. Brandic, I., Pllana, S., Benkner, S.: Specification, planning, and execution of QoS-aware Grid workflows within the Amadeus environment. Concurr. Comput. Pract. Exp. 20(4), 331–345 (2008)

    Article  Google Scholar 

  4. Chang, F., Karamcheti, V.: Automatic configuration and run-time adaptation of distributed applications. In: High Performance Distributed Computing, pp. 11–20 (2000)

  5. Chen, C., Chame, J., Hall, M.W.: Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. In: International Symposium on Code Generation and Optimization (2005)

  6. Chiu, D., Deshpande, S., Agrawal, G., Li, R.: Cost and accuracy sensitive dynamic workflow composition over grid environments. In: 9th IEEE/ACM International Conference on Grid Computing, pp. 9–16 (2008)

  7. Chow, S.K., Hakozaki, H., Price, D.L., MacLean, N.A.B., Deerinck, T.J., Bouwer, J.C., Martone, M.E., Peltier, S.T., Ellisman, M.H.: Automated microscopy system for mosaic acquisition and processing. J. Microsc. 222(2), 76–84 (2006)

    Article  MathSciNet  Google Scholar 

  8. Chung, I.H., Hollingsworth, J.: A case study using automatic performance tuning for large-scale scientific programs. In: 15th IEEE International Symposium on High Performance Distributed Computing, pp. 45–56 (2006)

  9. Chung, I.H., Hollingsworth, J.K.: Using information from prior runs to improve automated tuning systems. In: SC ’04: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, p. 30. IEEE Computer Society, Washington (2004)

    Google Scholar 

  10. Cortellessa, V., Marinelli, F., Potena, P.: Automated selection of software components based on cost/reliability tradeoff. In: Software Architecture, Third European Workshop, EWSA 2006. Lecture Notes in Computer Science, vol. 4344. Springer, Berlin (2006)

    Google Scholar 

  11. Deelman, E., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Patil, S., Su, M.H., Vahi, K., Livny, M.: Pegasus: Mapping scientific workflows onto the grid. In: Lecture Notes in Computer Science: Grid Computing, pp. 11–20 (2004)

  12. Gil, Y., Ratnakar, V., Deelman, E., Mehta, G., Kim, J.: Wings for Pegasus: Creating large-scale scientific applications using semantic representations of computational workflows. In: Proceedings of the 19th Annual Conference on Innovative Applications of Artificial Intelligence (IAAI) (2007)

  13. Glatard, T., Montagnat, J., Pennec, X.: Efficient services composition for grid-enabled data-intensive applications. In: Proceedings of the IEEE International Symposium on High Performance Distributed Computing (HPDC’06), Paris, France, 19 June 2006

  14. Kong, J., Sertel, O., Shimada, H., Boyer, K., Saltz, J., Gurcan, M.: Computer-aided grading of neuroblastic differentiation: Multi-resolution and multi-classifier approach. In: IEEE International Conference on Image Processing, ICIP 2007, vol. 5, pp. 525–528 (2007)

  15. Kumar, V., Rutt, B., Kurc, T., Catalyurek, U., Pan, T., Chow, S., Lamont, S., Martone, M., Saltz, J.: Large-scale biomedical image analysis in grid environments. IEEE Trans. Inf. Technol. Biomed. 12(2), 154–161 (2008)

    Article  Google Scholar 

  16. Kumar, V.S., Rutt, B., Kurc, T., Catalyurek, U., Saltz, J., Chow, S., Lamont, S., Martone, M.: Large image correction and warping in a cluster environment. In: SC ’06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 79. ACM, New York (2006)

    Chapter  Google Scholar 

  17. Kumar, V.S., Narayanan, S., Kurç, T.M., Kong, J., Gurcan, M.N., Saltz, J.H.: Analysis and semantic querying in large biomedical image datasets. IEEE Comput. 41(4), 52–59 (2008)

    Google Scholar 

  18. Lera, I., Juiz, C., Puigjaner, R.: Performance-related ontologies and semantic web applications for on-line performance assessment intelligent systems. Sci. Comput. Program. 61(1), 27–37 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  19. Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E.A., Tao, J., Zhao, Y.: Scientific workflow management and the Kepler system: Research articles. Concurr. Comput. Pract. Exp. 18(10), 1039–1065 (2006)

    Article  Google Scholar 

  20. Nelson, Y.L.: Model-guided performance tuning for application-level parameters. Ph.D. Dissertation, University of Southern California (2009)

  21. Norris, B., Ray, J., Armstrong, R., Mcinnes, L.C., Shende, S.: Computational quality of service for scientific components. In: Proceedings of the International Symposium on Component-based Software Engineering (CBSE7), pp. 264–271. Springer, Berlin (2004)

    Google Scholar 

  22. Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T., Glover, K., Pocock, M.R., Wipat, A., Li, P.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)

    Article  Google Scholar 

  23. Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: the Condor experience: Research articles. Concurr. Comput. Pract. Exp. 17(2–4), 323–356 (2005)

    Article  Google Scholar 

  24. Truong, H.L., Dustdar, S., Fahringer, T.: Performance metrics and ontologies for grid workflows. Future Gener. Comput. Syst. 23(6), 760–772 (2007)

    Article  Google Scholar 

  25. Wolski, R., Spring, N., Hayes, J.: The network weather service: a distributed resource performance forecasting service for metacomputing. J. Future Gener. Comput. Syst. 15, 757–768 (1999)

    Article  Google Scholar 

  26. Zhou, J., Cooper, K., Yen, I.L.: A rule-based component customization technique for QoS properties. In: Eighth IEEE International Symposium on High Assurance Systems Engineering, pp. 302–303 (2004)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vijay S. Kumar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kumar, V.S., Kurc, T., Ratnakar, V. et al. Parameterized specification, configuration and execution of data-intensive scientific workflows. Cluster Comput 13, 315–333 (2010). https://doi.org/10.1007/s10586-010-0133-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-010-0133-8

Keywords

Navigation