YAWL: An open source Business Process Management System from science for science

YAWL (Yet Another Workflow Language) is an open source Business Process Management System, first released in 2003. YAWL grew out of a university research environment to become a unique system that has been deployed worldwide as a laboratory environment for research in Business ProcessManagement and as a productive system in other scientific domains.


Motivation and significance
A business process is a collection of identifiable tasks that need to be performed in a certain order to achieve some business goal. Fig. 1 shows a graphical model of a simple example business process [1]   tasks are executed -by arrows and certain operators for forking and merging. The resources -i.e. the participant roles that are to perform the tasks -are depicted by colours here. Business Process Management Systems (BPMSs) add a third perspective, namely the data that are needed to perform each task. With defined specifications encapsulating these three perspectives, a BPMS can generate an IT system to support and partially automate business processes.
At the turn of the century, there existed many commercial and open source BPMSs, each coming with its own graphical notation. There were two major efforts to standardise these notations. The first was a consortium of the major players in the field. The results were serialisation standards for business process definitions, namely XPDL 1.0 and XPDL 2.0, and a graphical language called BPMN [2].
The second endeavour undertook a scientific analysis of all patterns of constructs used in BPM systems at that time [3,4]. Deriving from that work was a minimal but highly expressive graphical language together with a formal definition of its semantics. The implementation of this language yielded yet another BPMS, called YAWL [5]. By design, YAWL should be the most powerful language for process specification. It offers comprehensive support for the vast majority of the identified control-flow, data, resource, and exception handling patterns, and thus YAWL is able to manage processes of practically any complexity [5].
Over the years, YAWL has become a full-fledged BPMS that emphasises modularity, extensibility, robustness and ease of use, and is not only a research platform for Business Process Management but can be used productively in a wide range of scientific and organisational settings.

Software description
YAWL is a BPMS consisting of an editor for the definition of process specifications and an engine for their execution. Process specifications contain all perspectives that are necessary for process automation: the control flow, the resources, and the data. The control flow is edited as a sequence of tasks in a graphical form as shown in Fig. 1. How resources and data for each task are specified is discussed in Section 2.2.1.

Software architecture
The YAWL environment is based on a service-oriented architecture (SOA), with a core execution engine and a set of ancillary components implemented as RESTful Web services. This modular design allows for the modification and addition of components, according to specific requirements. A high-level overview of the YAWL architecture is shown in Fig. 2. The YAWL Engine is the central component that manages the creation and execution of cases, and keeps track of where each case is in its control flow, i.e. its current state. When a human task is ready to be performed, the engine deploys it to the Resource Service, which then assigns a user, in accordance with the process specification. Automated tasks -for example, the sending of an e-mail or intensive data transformations -will be deployed to bespoke services. In a laboratory setting, for example, the integration of a machine that performs certain experiments can be realised by developing a dedicated support service for it.
The Administration Module allows for uploading new specifications, reassigning work items, cancelling cases, etc. The Worklist Handler shows each user the work items that are currently assigned to them. When the user opens one of these work items, the Task Data Input Forms Generator will automatically generate an appropriate web form based on the data types and values of the corresponding task of the specification.
All YAWL services are deployed within a servlet container; the standard YAWL distribution is bundled with Apache Tomcat. 1 Hibernate ORM 2 manages the persistence of the system and supports a wide range of relational DBMSs.

Software functionalities
With YAWL, processes can be designed, executed, and analysed. Because of its high modularity, every part of the environment, including all user interfaces, can be modified or replaced by users and developers.

YAWL process editor
As stated above, the YAWL process editor allows for the creation of process specifications covering all perspectives necessary for process automation. The YAWL editor allows for the creation of graphical process specifications in the YAWL language, such as those shown in Figs. 1 and 3. In addition to what is visible in the graphical diagram, there are data variables defined for each task, using XML Schema. Task variables can be created and mapped from global variables by simple drag and drop operations. If more complex operations are necessary, XPath and XQuery expressions may be used. Regarding the resource perspective, YAWL can categorise human and non-human resources into work roles, positions, and organisational capability based groupings. Based on these resources, YAWL can distribute work in many different ways [5]. Users and user groupings can be created directly within YAWL, and/or organisational data can be imported from LDAP or other related systems, such as Liferay Portal.

YAWL runtime environment
For human participants, the YAWL environment supports the automated generation of browser-based forms that display and capture data values of task input and output variables. The layout and content of the generated automated forms can be further guided with pre-set and user-defined extended attributes that may be assigned at design time and applied during the prerendering phase of the form at runtime. These features are particularly useful for rapid prototyping purposes, and may be fully replaced with a pluggable set of user-defined forms as needed.

Exception handling
YAWL offers unique support for dynamic processes and exception handling through the Worklets/Exlets approach [6]. Worklets are small process definitions that replace placeholders in a higherlevel process specification at runtime, based on the context of each case, following an extensible set of rules that can be extended on-the-fly while the process is executing. Exlets are an extensible set of operational primitives and compensation processes that, when defined, will automatically execute in the event of a process error or deviation, whether or not the error was anticipated at design time [6].

Process logging and mining
All system and process actions are logged, and that data is made available in a variety of formats, including the open source XES format. This provides easy integration into process mining tools, such as ProM, 3 for post-execution analysis and diagnosis.

Illustrative example
The following example serves to illustrate the use of the YAWL environment to define and execute scientific processes. This example is based on the epigenomics workflow described by Juve et al. [7], which automates the generation and display of epigenetic state data from human cells. Many more thorough exemplars can be found, for example, in [5,[8][9][10][11][12] The process specification consists of a main (or primary) model and a sub-process model, as shown in Fig. 3. The process begins with the selection of DNA sequence data sources and the choice of a genome strand to isolate and display. The data is then split into a chosen number of chunks, each to be processed in parallel.
The Process Chunks sub-process is an example of a multiple instance sub-process, that is it is dynamically instantiated multiple times, one instance for each data chunk, depending on the number of chunks to be processed. Each sub-process instance is assigned one chunk, and begins by first cleansing it to remove noisy or contaminated segments, then converting the data to the binary format required for input into the Map genome task, which maps the binary DNA data into the relevant selected strand location on a reference genome.
Once all of the sub-processes have completed, each mapping is merged into a global genome map, then indexed by region. Finally, information from the specified genome region is extracted, then converted into the proper data format for display within a graphical user interface.
Like many scientific workflows, the process is quite linear, where the outputs of one task become the inputs for the next task in the sequence. The ability to succinctly define a multiple instance sub-process, which will spawn a number of instances depending on a matrix of runtime data partition values and available computing resources, effectively creates a multi-parallel, distributed execution network, while negating the need for an over-complication of the model. YAWL provides for the work of a task to be assigned to any resource, human or software. The first two tasks in this workflow are shaded to denote their execution by human participants. Further, they are shaded with the same colour to denote that the participant assigned the second task (e.g. a member of the Epigenetic Investigator role) must be the same participant who completed the first task; this is an example of the retain familiar resourcing pattern [4]. The remaining tasks are annotated with triangular 'play' icons to denote that those tasks are automated. In this example, each automated task is assigned to a scripting engine where bespoke routines are called to operate on the input data streams. In this way, work can be distributed across as many computing resources as are available to efficiently run those routines.
The exception handling capabilities of YAWL [6] (not depicted in Fig. 3) allow exceptions to be detected and effectively handled in real time, for example if the data cleanse routine failed to remove errant data or if there was a failure in a mapping operation, so that the process can continue unheeded. These capabilities negate the need to cancel and restart the entire process when an error occurs.

Impact
YAWL has been downloaded more than 250,000 times, and to over 170 countries. From prototypical beginnings, there have been more than 40 formal version updates, including three major version releases: v2, which included a new resource perspective, new administration and user interfaces, and new automated forms; v3, which incorporated a new and enhanced process editor that also supports plug-in extensions; and v4, which included a new control panel app that fully encapsulated the environment into a single entity. The latest version, v4.3.1, contains new security features, as well as a number of minor enhancements and updates.
The YAWL language and environment has been used extensively for research purposes. A primary factor in the applicability of YAWL for research is its extensibility: it is relatively easy to develop a new extension, service or enhancement and plug it in to the environment. Each core component of the YAWL environment has its own extensive Application Programming Interface (API). A task in a YAWL process can be delegated at runtime to any service, system, application, person or code module, which allows a task in a process model to represent any applicable action.
A Google Scholar search for ''yawl business process management'' yielded approximately 6000 academic papers citing YAWL in some way. For instance, YAWL has been used in research efforts involving: the study of process language transformations [13,14]; the definition of formal semantics of other languages using YAWL formalisations [15]; approaches to process model analysis and verification [11,16]; process flexibility extensions and techniques [8,17]; exception handling techniques during process execution [6]; process configuration methods [10]; case studies of organisational processes using YAWL [9,12,18]; and process simulation research [19], to select but a few.
While the processes in BPMSs such as YAWL are driven by the control flow (as illustrated by the process definitions in Figs. 1 and 3), in e-Research there are processes or workflows that are purely driven by the availability of data. Popular scientific workflow systems in e-Research include Taverna, Kepler, and others. Nevertheless, a project by INRIA involving high-performance computing chose YAWL for process support due to its dynamic exception handling capabilities [20].

Conclusion
The YAWL system continues to evolve as ideas for new uses and applications are realised. YAWL in the Cloud is one recent project where we are exploring ways to deploy a load-balanced array of YAWL engines in a cloud environment in an effort to provide BPMS services and benefits to a number of organisations, negating the need for those organisations having to deploy their own BPMS locally [21].
Blockchain integration is another developing direction, where distributed ledger technologies can be leveraged to support interorganisational workflow without the need for a trusted intermediary [22].
Recent efforts and emerging pathways are indicative of the ongoing applicability of the YAWL environment. While there have been many benefits realised and challenges met since its first release, the YAWL system has maintained its relevance in many varied research and learning domains.