Abstract
Researchers compose scientific workflows for complex scientific experiments and simulations by connecting tools and data in a pipeline. The usability of a scientific workflow management system (SWfMS) largely depends on the availability of necessary tools in the system and the simplicity of their usage. Scientific experiments are incredibly diverse and need a wide variety of tools. Due to an overwhelming number of available tools in the public domain, a SWfMS cannot preinstall all tools required for multifarious experiments. Hence an extensibility mechanism to integrate external tools is greatly important for the flexibility of SWfMS. Tools are independently developed by different development teams using their favorite or suitable programming languages and may run on different operating environments. The tool integration is challenging due to the myriad development languages used for tools and potentially varying operating environments of SWfMS and tools. The software tools may not be robust enough for workflow integration. The state-of-the-art SWfMSs such as Galaxy and KNIME are web-based and can simultaneously serve hundreds of users. The end-users may want to quickly integrate their code in a SWfMS as a tool and use it in a workflow model. But many tools require a system configuration change, which end-users are not authorized to do. The integrated tool must also fit the workflow pipeline with input and output datasets. End-users need an efficient user interface for tool integration by themselves. We created 50 workflows in image processing, bioinformatics, and software analytics domains using VizSciFlow SWfMS. We gathered the challenges we encountered while extending it by integrating tools for these workflows using its extensibility interface. In this paper, we describe the challenges and propose solutions with the help of two case studies we conducted by developing two real-world workflow products - CoGe’s SynMap workflow in the Bioinformatics domain and source code clone detection and validation in the Software Analytics domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Alam, K., Roy, B.: Challenges of provenance in scientific workflow management systems. In: 2022 IEEE/ACM Workshop on Workflows in Support Of Large-Scale Science (WORKS), pp. 10–18 (2022)
Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludascher, B., Mock, S.: Kepler: an extensible system for design and execution of scientific workflows. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management, p. 423- (2004). https://doi.org/10.1109/SSDBM.2004.44
Arango, C., Dernat, R., Sanabria, J.: Performance evaluation of container-based virtualization for high performance computing environments. ArXiv Preprint arXiv:1709.10140 (2017)
Blankenberg, D., et al.: Dissemination of scientific software with Galaxy ToolShed. Genome Biol. 15, 1–3 (2014)
Brack, P., et al.: Ten Simple Rules For Making a Software Tool Workflow-ready. Public Library of Science, San Francisco (2022)
Brazas, M.D., Yim, D., Yeung, W., Ouellette, B.F.: A decade of web server updates at the bioinformatics links directory: 2003–2012. Nucleic Acids Res. 40, W3–W12 (2012)
Callahan, S., Freire, J., Santos, E., Scheidegger, C., Silva, C., Vo, H.: VisTrails: visualization meets data management. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 745–747 (2006)
Caporaso, J., et al.: QIIME allows analysis of high-throughput community sequencing data. Nat. Methods. 7, 335 (2010)
Cleary, P., Bolger, M., Hetherton, L., Rucinski, C., Thomas, D., Watkins, D.: Workspace: a platform for Delivering Scientific Applications. In: Proceedings EResearch. (2014)
Cruz, S., Campos, M., Mattoso, M.: Towards a taxonomy of provenance in scientific workflow management systems. In: 2009 Congress on Services-I, pp. 259–266 (2009)
Eisenbach, S., Jurisic, V., Sadler, C.: Feeling the way through DLL Hell. In: Proceedings Of The First Workshop On Unanticipated Software Evolution (USE 2002), Malaga, Spain (2002)
Fillbrunn, A., Dietz, C., Pfeuffer, J., Rahn, R., Landrum, G., Berthold, M.: KNIME for reproducible cross-domain analysis of life science data. J. Biotechnol. 261, 149–156 (2017)
Fox, J.A., Butland, S.L., McMillan, S., Campbell, G., Ouellette, B.F.: The bioinformatics links directory: a compilation of molecular biology web servers. Nucleic Acids Res. 33, W3–W24 (2005)
Giardine, B.,et al.: Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 15, 1451–1455 (2005)
Gil, Y., et al.: Examining the challenges of scientific workflows. Computer. 40, 24–32 (2007)
Haug-Baltzell, A., Stephens, S., Davey, S., Scheidegger, C., Lyons, E.: SynMap2 and SynMap3D: web-based whole-genome synteny browsers. Bioinformatics. 33, 2197–2198 (2017). https://doi.org/10.1093/bioinformatics/btx144
Heinl, P., Horn, S., Jablonski, S., Neeb, J., Stein, K., Teschke, M.A.: Comprehensive approach to flexibility in workflow management systems. In: Proceedings of the International Joint Conference on Work Activities Coordination and Collaboration, pp. 79–88 (1999). https://doi.org/10.1145/295665.295675
Hossain, M.M., Roy, B., Roy, C.K., Schneider, K.A.: VizSciFlow: a visually guided scripting framework for supporting complex scientific data analysis. Proc. ACM Human-Comput. Interact. 4, 1–37 (2020)
Hossain, M., Roy, B., Roy, C., Schneider, K.: A domain-specific composition environment for provenance query of scientific workflows. In: 2022 IEEE/ACM Workshop On Workflows in Support of Large-Scale Science (WORKS), pp. 19–26 (2022)
Kurtzer, G., Sochat, V., Bauer, M.: Singularity: scientific containers for mobility of compute. PloS One. 12, e0177459 (2017)
Lawrence, P.: Workflow Handbook. John Wiley and Sons Inc. (1997)
Lidwell, W., et al.: Universal principles of design, revised and updated: 125 ways to enhance usability, influence perception, increase appeal, make better design decisions, and teach through design. Rockport Pub (2010)
Lin, C., et al.: A reference architecture for scientific workflow management systems and the VIEW SOA solution. IEEE Trans. Serv. Comput. 2, 79–92 (2009)
Liu, J., Pacitti, E., Valduriez, P., Mattoso, M.: A survey of data-intensive scientific workflow management. J. Grid Comput. 13, 457–493 (2015)
Ludäscher, B., Weske, M., McPhillips, T., Bowers, S.: Scientific workflows: business as usual? In: International Conference on Business Process Management, pp. 31–47 (2009)
Lyons, E., Bomhoff, M., Oliver, S., Lenards, A.: Comparative Genomics of Grass Genomes using CoGe. In: Handbook of Plant (2014)
Mostaeen, G., et al.: CloneCognition: machine learning based code clone validation tool. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations Of Software Engineering, pp. 1105–1109 (2019)
Mostaeen, G., Roy, B., Roy, C., Schneider, K., Svajlenko, J.: A machine learning based framework for code clone validation. J. Syst. Softw. 169, 110686 (2020)
Mostaeen, G., Roy, B., Roy, C., Schneider, K.: Designing for real-time groupware systems to support complex scientific data analysis. Proc. ACM Human-Comput. Interact. 3, 1–28 (2019)
Pesic, M., Schonenberg, H., Aalst, W.: Declarative workflow. In: Modern Business Process Automation, pp. 175–201 (2010)
Rad, B., Bhatti, H., Ahmadi, M.: An introduction to docker and analysis of its performance. Int. J. Comput. Sci. Netw. Secur. (IJCSNS). 17, 228 (2017)
Roy, C., Cordy, J.: NICAD: accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. In: 2008 16th IEEE International Conference on Program Comprehension, pp. 172–181 (2008)
Sadedin, S., Pope, B., Oshlack, A.: BPIPE: a tool for running and managing bioinformatics pipelines. Bioinformatics 28, 1525–1526 (2012)
Schonenberg, H., Mans, R., Russell, N., Mulyar, N., Aalst, W.: Process flexibility: a survey of contemporary approaches. Adv. Enterp. Eng. I, pp. 16–30 (2008)
Sloggett, C., Goonasekera, N., Afgan, E.: BioBlend: automating pipeline analyses within Galaxy and CloudMan. Bioinformatics 29, 1685–1686 (2013)
Taschuk, M., Wilson, G.: Ten Simple Rules For Making Research Software More Robust. Public Library of Science, San Francisco (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hossain, M.M., Roy, B., Roy, C., Schneider, K. (2023). Extensibility Challenges of Scientific Workflow Management Systems. In: Mori, H., Asahi, Y. (eds) Human Interface and the Management of Information. HCII 2023. Lecture Notes in Computer Science, vol 14016. Springer, Cham. https://doi.org/10.1007/978-3-031-35129-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-35129-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35128-0
Online ISBN: 978-3-031-35129-7
eBook Packages: Computer ScienceComputer Science (R0)