Published May 2, 2023 | Version v1
Project deliverable Open

EOSC-Life Implementation of a mechanism for publishing and sharing workflows across instances of the environment

Description

Just as data is expected to be FAIR (Findable, Accessible, Interoperable, Reusable), so are workflows, specialised forms of software that encode processing pipelines and analytical know-how. Sharing workflows opens them to review and reuse, supports their citability with due author credit and attribution, and provides steps towards reproducibility of the data results, with accurately documented method and provenance lineage of derived data. Reusing workflows, including adapting them for new data, parameters, or steps, improves the sharing of processing know-how, and improves productivity and reliability, saving labour. Using workflows as an access point to the tools and computational resources of EOSC democratises the use of computational platforms and the availability of computational methods that are beyond the skills of a large body of researchers. The production of workflows is labour-intensive and a highly skilled task, and as such they are important assets to be exchanged in EOSC. Across the BMS RIs, workflows are scattered in private or platform-specific repositories and hidden in containers, data repositories or supplementary materials. 

The aim of this deliverable is to overcome this fragmentation without losing autonomy of the workflow developers and users by creating WorkflowHub [1], a unifying workflow registry with rich metadata capture for findability and to support value-added services for workflow testing using the LifeMonitor service. WorkflowHub is workflow manager agnostic, supporting workflows in their native repositories and instances and links to registries such as bio.tools, to support discovery over the fragmented EOSC ecosystem. To date 319 workflows have been registered from 14 different workflow managers: the most popular are Galaxy, Nextflow, Snakemake, Python Scripts and Jupyter Notebooks. 229 workflows are linked to an external URL, 71 of which are linked to a git repository. However, it also supports the manual uploading and storage of files, and as such also acts as a repository: currently, 90 workflows are uploaded as files. The registry supports a common API to simplify access for tool developers and uses the GA4GH TRS API for supporting the direct execution of workflows. It also supports workflow snapshot preservation, DOI publishing, citation and monitoring, partnering with DataCite in the scholarly communication landscape. 

The Hub’s emphasis is on machine- and human-readable metadata to drive FAIR capability, requiring a new workflow metadata framework to be developed and adopted. Standardised workflow identifiers and metadata descriptions support workflow discovery, reuse, preservation, interoperability and monitoring and metadata harvesting using standard protocols. As workflows are multi-component objects (requiring links to test data, example runs, explanatory documentation, etc.) we use the RO-Crate specification for packaging workflows, which is an implementation mechanism for FAIR Digital Objects as envisioned by the EOSC Interoperability Framework. These digital objects are used to exchange workflows, their metadata and their companion objects (e.g., workflow tests) between the Hub, Workflow Managers and workflow services such as LifeMonitor, and to deposit workflows in EOSC long-term preservation archives such as Zenodo. 

Files

EOSC-Life_D2.3_Implementation of a mechanism for publishing and sharing workflows across instances of the environment_May 2023.pdf

Additional details

Funding

EOSC-Life – Providing an open collaborative space for digital biology in Europe 824087
European Commission