Skip to main content

Extensibility Challenges of Scientific Workflow Management Systems

  • Conference paper
  • First Online:
Human Interface and the Management of Information (HCII 2023)

Abstract

Researchers compose scientific workflows for complex scientific experiments and simulations by connecting tools and data in a pipeline. The usability of a scientific workflow management system (SWfMS) largely depends on the availability of necessary tools in the system and the simplicity of their usage. Scientific experiments are incredibly diverse and need a wide variety of tools. Due to an overwhelming number of available tools in the public domain, a SWfMS cannot preinstall all tools required for multifarious experiments. Hence an extensibility mechanism to integrate external tools is greatly important for the flexibility of SWfMS. Tools are independently developed by different development teams using their favorite or suitable programming languages and may run on different operating environments. The tool integration is challenging due to the myriad development languages used for tools and potentially varying operating environments of SWfMS and tools. The software tools may not be robust enough for workflow integration. The state-of-the-art SWfMSs such as Galaxy and KNIME are web-based and can simultaneously serve hundreds of users. The end-users may want to quickly integrate their code in a SWfMS as a tool and use it in a workflow model. But many tools require a system configuration change, which end-users are not authorized to do. The integrated tool must also fit the workflow pipeline with input and output datasets. End-users need an efficient user interface for tool integration by themselves. We created 50 workflows in image processing, bioinformatics, and software analytics domains using VizSciFlow SWfMS. We gathered the challenges we encountered while extending it by integrating tools for these workflows using its extensibility interface. In this paper, we describe the challenges and propose solutions with the help of two case studies we conducted by developing two real-world workflow products - CoGe’s SynMap workflow in the Bioinformatics domain and source code clone detection and validation in the Software Analytics domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://pypi.org/.

References

  1. Alam, K., Roy, B.: Challenges of provenance in scientific workflow management systems. In: 2022 IEEE/ACM Workshop on Workflows in Support Of Large-Scale Science (WORKS), pp. 10–18 (2022)

    Google Scholar 

  2. Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludascher, B., Mock, S.: Kepler: an extensible system for design and execution of scientific workflows. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management, p. 423- (2004). https://doi.org/10.1109/SSDBM.2004.44

  3. Arango, C., Dernat, R., Sanabria, J.: Performance evaluation of container-based virtualization for high performance computing environments. ArXiv Preprint arXiv:1709.10140 (2017)

  4. Blankenberg, D., et al.: Dissemination of scientific software with Galaxy ToolShed. Genome Biol. 15, 1–3 (2014)

    Article  Google Scholar 

  5. Brack, P., et al.: Ten Simple Rules For Making a Software Tool Workflow-ready. Public Library of Science, San Francisco (2022)

    Book  Google Scholar 

  6. Brazas, M.D., Yim, D., Yeung, W., Ouellette, B.F.: A decade of web server updates at the bioinformatics links directory: 2003–2012. Nucleic Acids Res. 40, W3–W12 (2012)

    Article  Google Scholar 

  7. Callahan, S., Freire, J., Santos, E., Scheidegger, C., Silva, C., Vo, H.: VisTrails: visualization meets data management. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 745–747 (2006)

    Google Scholar 

  8. Caporaso, J., et al.: QIIME allows analysis of high-throughput community sequencing data. Nat. Methods. 7, 335 (2010)

    Google Scholar 

  9. Cleary, P., Bolger, M., Hetherton, L., Rucinski, C., Thomas, D., Watkins, D.: Workspace: a platform for Delivering Scientific Applications. In: Proceedings EResearch. (2014)

    Google Scholar 

  10. Cruz, S., Campos, M., Mattoso, M.: Towards a taxonomy of provenance in scientific workflow management systems. In: 2009 Congress on Services-I, pp. 259–266 (2009)

    Google Scholar 

  11. Eisenbach, S., Jurisic, V., Sadler, C.: Feeling the way through DLL Hell. In: Proceedings Of The First Workshop On Unanticipated Software Evolution (USE 2002), Malaga, Spain (2002)

    Google Scholar 

  12. Fillbrunn, A., Dietz, C., Pfeuffer, J., Rahn, R., Landrum, G., Berthold, M.: KNIME for reproducible cross-domain analysis of life science data. J. Biotechnol. 261, 149–156 (2017)

    Article  Google Scholar 

  13. Fox, J.A., Butland, S.L., McMillan, S., Campbell, G., Ouellette, B.F.: The bioinformatics links directory: a compilation of molecular biology web servers. Nucleic Acids Res. 33, W3–W24 (2005)

    Article  Google Scholar 

  14. Giardine, B.,et al.: Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 15, 1451–1455 (2005)

    Google Scholar 

  15. Gil, Y., et al.: Examining the challenges of scientific workflows. Computer. 40, 24–32 (2007)

    Google Scholar 

  16. Haug-Baltzell, A., Stephens, S., Davey, S., Scheidegger, C., Lyons, E.: SynMap2 and SynMap3D: web-based whole-genome synteny browsers. Bioinformatics. 33, 2197–2198 (2017). https://doi.org/10.1093/bioinformatics/btx144

  17. Heinl, P., Horn, S., Jablonski, S., Neeb, J., Stein, K., Teschke, M.A.: Comprehensive approach to flexibility in workflow management systems. In: Proceedings of the International Joint Conference on Work Activities Coordination and Collaboration, pp. 79–88 (1999). https://doi.org/10.1145/295665.295675

  18. Hossain, M.M., Roy, B., Roy, C.K., Schneider, K.A.: VizSciFlow: a visually guided scripting framework for supporting complex scientific data analysis. Proc. ACM Human-Comput. Interact. 4, 1–37 (2020)

    Article  Google Scholar 

  19. Hossain, M., Roy, B., Roy, C., Schneider, K.: A domain-specific composition environment for provenance query of scientific workflows. In: 2022 IEEE/ACM Workshop On Workflows in Support of Large-Scale Science (WORKS), pp. 19–26 (2022)

    Google Scholar 

  20. Kurtzer, G., Sochat, V., Bauer, M.: Singularity: scientific containers for mobility of compute. PloS One. 12, e0177459 (2017)

    Article  Google Scholar 

  21. Lawrence, P.: Workflow Handbook. John Wiley and Sons Inc. (1997)

    Google Scholar 

  22. Lidwell, W., et al.: Universal principles of design, revised and updated: 125 ways to enhance usability, influence perception, increase appeal, make better design decisions, and teach through design. Rockport Pub (2010)

    Google Scholar 

  23. Lin, C., et al.: A reference architecture for scientific workflow management systems and the VIEW SOA solution. IEEE Trans. Serv. Comput. 2, 79–92 (2009)

    Article  Google Scholar 

  24. Liu, J., Pacitti, E., Valduriez, P., Mattoso, M.: A survey of data-intensive scientific workflow management. J. Grid Comput. 13, 457–493 (2015)

    Article  Google Scholar 

  25. Ludäscher, B., Weske, M., McPhillips, T., Bowers, S.: Scientific workflows: business as usual? In: International Conference on Business Process Management, pp. 31–47 (2009)

    Google Scholar 

  26. Lyons, E., Bomhoff, M., Oliver, S., Lenards, A.: Comparative Genomics of Grass Genomes using CoGe. In: Handbook of Plant (2014)

    Google Scholar 

  27. Mostaeen, G., et al.: CloneCognition: machine learning based code clone validation tool. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations Of Software Engineering, pp. 1105–1109 (2019)

    Google Scholar 

  28. Mostaeen, G., Roy, B., Roy, C., Schneider, K., Svajlenko, J.: A machine learning based framework for code clone validation. J. Syst. Softw. 169, 110686 (2020)

    Article  Google Scholar 

  29. Mostaeen, G., Roy, B., Roy, C., Schneider, K.: Designing for real-time groupware systems to support complex scientific data analysis. Proc. ACM Human-Comput. Interact. 3, 1–28 (2019)

    Article  Google Scholar 

  30. Pesic, M., Schonenberg, H., Aalst, W.: Declarative workflow. In: Modern Business Process Automation, pp. 175–201 (2010)

    Google Scholar 

  31. Rad, B., Bhatti, H., Ahmadi, M.: An introduction to docker and analysis of its performance. Int. J. Comput. Sci. Netw. Secur. (IJCSNS). 17, 228 (2017)

    Google Scholar 

  32. Roy, C., Cordy, J.: NICAD: accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. In: 2008 16th IEEE International Conference on Program Comprehension, pp. 172–181 (2008)

    Google Scholar 

  33. Sadedin, S., Pope, B., Oshlack, A.: BPIPE: a tool for running and managing bioinformatics pipelines. Bioinformatics 28, 1525–1526 (2012)

    Article  Google Scholar 

  34. Schonenberg, H., Mans, R., Russell, N., Mulyar, N., Aalst, W.: Process flexibility: a survey of contemporary approaches. Adv. Enterp. Eng. I, pp. 16–30 (2008)

    Google Scholar 

  35. Sloggett, C., Goonasekera, N., Afgan, E.: BioBlend: automating pipeline analyses within Galaxy and CloudMan. Bioinformatics 29, 1685–1686 (2013)

    Article  Google Scholar 

  36. Taschuk, M., Wilson, G.: Ten Simple Rules For Making Research Software More Robust. Public Library of Science, San Francisco (2017)

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Mainul Hossain .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hossain, M.M., Roy, B., Roy, C., Schneider, K. (2023). Extensibility Challenges of Scientific Workflow Management Systems. In: Mori, H., Asahi, Y. (eds) Human Interface and the Management of Information. HCII 2023. Lecture Notes in Computer Science, vol 14016. Springer, Cham. https://doi.org/10.1007/978-3-031-35129-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-35129-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-35128-0

  • Online ISBN: 978-3-031-35129-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics