Skip to main content

Towards Case-Based Support for e-Science Workflow Generation by Mining Provenance

  • Conference paper
Advances in Case-Based Reasoning (ECCBR 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5239))

Included in the following conference series:

Abstract

e-Science brings large-scale computation to bear on scientific problems, often by performing sequences of computational tasks organized into workflows and executed on distributed Web resources. Sophisticated AI tools have been developed to apply knowledge-rich methods to compose scientific workflows by generative planning, but the required knowledge can be difficult to acquire. Current work by the cyberinfrastructure community aims to routinely capture provenance during workflow execution, which would provide a new experience-based knowledge source for workflow generation: large-scale databases of workflow execution traces. This paper proposes exploiting these databases with a “knowledge light” approach to reuse, applying CBR methods to those traces to support scientists’ workflow generation process. This paper introduces e-Science workflows as a CBR domain, sketches key technical issues, and illustrates directions towards addressing these issues through ongoing research on Phala, a system which supports workflow generation by aiding re-use of portions of prior workflows. The paper uses workflow data collected by the myGrid and myExperiment projects in experiments which suggest that Phala’s methods have promise for assisting workflow composition in the context of scientific experimentation.

This material is based on work supported by the National Science Foundation under Grant No. OCI-0721674. Our thanks to Beth Plale, Yogesh Simmhan, and the rest of the Indiana University SDCI group at IU for their vital contributions to this work, and to Yogesh Simmhan and the anonymous reviewers for valuable comments on a draft of this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Simmhan, Y., Plale, B., Gannon, D.: A survey of data provenance in e-Science. SIGMO Record 34(3), 31–36 (2005)

    Article  Google Scholar 

  2. Gil, Y., Deelman, E., Blythe, J., Kesselman, C., Tangmunarunkit, H.: Artificial intelligence and grids: Workflow planning and beyond. IEEE Intelligent Systems 19(1), 26–33 (2004)

    Article  Google Scholar 

  3. Xiang, X., Madey, G.R.: Improving the reuse of scientific workflows and their by-products. In: ICWS, pp. 792–799. IEEE Computer Society, Los Alamitos (2007)

    Google Scholar 

  4. Kim, J., Spraragen, M., Gil, Y.: An intelligent assistant for interactive workflow composition. In: IUI 2004: Proceedings of the 9th international conference on Intelligent user interfaces, pp. 125–131. ACM, New York (2004)

    Chapter  Google Scholar 

  5. Roure, D.D., Goble, C.: Six principles of software design to empower scientists. IEEE Software (January 2008)

    Google Scholar 

  6. Goderis, A., Li, P., Goble, C.: Workflow discovery: the problem, a case study from e-science and a graph-based solution. ICWS 0, 312–319 (2006)

    Google Scholar 

  7. Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance techniques. Technical Report 612, Computer Science Department, Indiana University (2005)

    Google Scholar 

  8. Minor, M., Tartakovski, A., Schmalen, D., Bergmann, R.: Agile workflow technology and case-based change reuse for long-term processes. International Journal of Intelligent Information Technologies (2007)

    Google Scholar 

  9. Weber, B., Wild, W., Breu, R.: CBRFlow: Enabling adaptive workflow management through conversational case-based reasoning. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 434–448. Springer, Heidelberg (2004)

    Google Scholar 

  10. Madhusudan, T., Zhao, J.L., Marshall, B.: A case-based reasoning framework for workflow model management. Data Knowl. Eng. 50(1), 87–115 (2004)

    Article  Google Scholar 

  11. Leake, D., Whitehead, M.: Case provenance: The value of remembering case sources. In: Weber, R.O., Richter, M.M. (eds.) ICCBR 2007. LNCS (LNAI), vol. 4626. Springer, Heidelberg (2007)

    Google Scholar 

  12. Leake, D., Dial, S.: Using case provenance to propagate feedback to cases and adaptations. In: Proceedings of the Nineth European Conference on Case-Based Reasoning. Springer, Heidelberg (in press, 2008)

    Google Scholar 

  13. Dean, T., Boddy, M.: An analysis of time-dependent planning. In: Proceedings of the seventh national conference on artificial intelligence, pp. 49–54. Morgan Kaufmann, San Mateo (1988)

    Google Scholar 

  14. Simmhan, Y.L., Plale, B., Gannon, D.: Karma2: Provenance management for data driven workflows. International Journal of Web Services Research 5, 1 (2008)

    Google Scholar 

  15. Shirasuna, S.: A Dynamic Scientific Workflow System for the Web Services Architecture. PhD thesis, Indiana University (September 2007)

    Google Scholar 

  16. Oinn, T., Greenwood, M., Addis, M., Alpdemir, M.N., Ferris, J., Glover, K., Goble, C., Goderis, A., Hull, D., Marvin, D., Li, P., Lord, P., Pocock, M.R., Senger, M., Stevens, R., Wipat, A., Wroe, C.: Taverna: lessons in creating a workflow environment for the life sciences: Research articles. Concurr. Comput.: Pract. Exper. 18(10), 1067–1100 (2006)

    Article  Google Scholar 

  17. Droegemeier, K.: Linked Environments for Atmospheric Discovery (LEAD): A Cyberinfrastructure for Mesoscale Meteorology Research and Education. AGU Fall Meeting Abstracts (December 2004)

    Google Scholar 

  18. van der Aalst W.M.P., ter Hofstede A.H.M., B., K., A.P., B.: Workflow patterns. Distributed and Parallel Databases 14(47), 5–51 (2003)

    Google Scholar 

  19. Anandan, S., Summers, J.D.: Similarity metrics applied to graph based design model authoring. Computer-Aided Design and Applications 3(1-4), 297–306 (2006)

    Google Scholar 

  20. Minor, M., Schmalen, D., Koldehoff, A., Bergmann, R.: Structural adaptation of workflows supported by a suspension mechanism and by case-based reasoning. In: Proceedings of the 16th IEEE Internazional Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE 2007), pp. 370–375. IEEE Computer Society, Los Alamitos (2007)

    Chapter  Google Scholar 

  21. Goble, C.A., Roure, D.C.D.: Experiment: social networking for workflow-using e-scientists. In: WORKS 2007: Proceedings of the 2nd workshop on Workflows in support of large-scale science, pp. 1–2. ACM, New York (2007)

    Google Scholar 

  22. Champin, P.A., Solnon, C.: Measuring the similarity of labeled graphs. In: Ashley, K.D., Bridge, D.G. (eds.) ICCBR 2003. LNCS, vol. 2689, pp. 80–95. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  23. Stevens, R.D., Goble, A.R.,, C.A.: Grid: Personalised bioinformatics on the information grid. In: Proceedings 11th International Conference on Intelligent Systems in Molecular Biology, ISBN N/A (June 2003)

    Google Scholar 

  24. Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T., Glover, K., Pocock, M.R., Wipat, A., Li, P.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)

    Article  Google Scholar 

  25. Wroe, C., Goble, C., Goderis, A., Lord, P., Miles, S., Papay, J., Alper, P., Moreau, L.: Recycling workflows and services through discovery and reuse: Research articles. Concurr. Comput. Pract. Exper. 19(2), 181–194 (2007)

    Article  Google Scholar 

  26. Aha, D.W.: Generalizing from case studies: A case study. In: Proceedings of the Ninth International Conference on Machine Learning (1992)

    Google Scholar 

  27. Kim, J.H., Suh, W., Lee, H.: Document-based workflow modeling: a case-based reasoning approach. Expert Syst. Appl. 23(2), 77–93 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Klaus-Dieter Althoff Ralph Bergmann Mirjam Minor Alexandre Hanft

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Leake, D., Kendall-Morwick, J. (2008). Towards Case-Based Support for e-Science Workflow Generation by Mining Provenance. In: Althoff, KD., Bergmann, R., Minor, M., Hanft, A. (eds) Advances in Case-Based Reasoning. ECCBR 2008. Lecture Notes in Computer Science(), vol 5239. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85502-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85502-6_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85501-9

  • Online ISBN: 978-3-540-85502-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics