Process mining for healthcare: Characteristics and challenges

Process mining techniques can be used to analyse business processes using the data logged during their execution. These techniques are leveraged in a wide range of domains, including healthcare, where it focuses mainly on the analysis of diagnostic, treatment, and organisational processes. Despite the huge amount of data generated in hospitals by staff and machinery involved in healthcare processes, there is no evidence of a systematic uptake of process mining beyond targeted case studies in a research context. When developing and using process mining in healthcare, distinguishing characteristics of healthcare processes such as their variability and patient-centred focus require targeted attention. Against this background, the Process-Oriented Data Science in Healthcare Alliance has been established to propagate the research and application of techniques targeting the data-driven improvement of healthcare processes. This paper, an initiative of the alliance, presents the distinguishing characteristics of the healthcare domain that need to be considered to successfully use process mining, as well as open challenges that need to be addressed by the community in the future.


Introduction
Innovations make healthcare better, affordable and efficient.Developments such as new technologies and business models help to move healthcare forward [1].In addition, healthcare systems are confronted across the world with unprecedented challenges, including the permanent and rapid adaptation of clinical processes based on the emerging scientific evidence [2] and the provision of high-quality care with limited resources [3][4][5].Within this context, healthcare organisations, such as hospitals, are aware of the need to manage and improve both their clinical processes (e.g., care pathways describing the treatment of a particular medical condition over the time) and their organisational/ administrative processes (e.g., billing processes) [6,7].
Process execution data are a valuable information source to support the management and improvement of healthcare processes.Typically, healthcare organisations make intensive use of Health Information Systems (HISs), such as a hospital information system.During the execution of a process, several entries in the HISs are recorded (e.g., when a patient was registered or was subject to a clinical examination by a physician).The entries in the databases of these HISs can be leveraged to generate an event log describing the sequence of activities that were performed, when they were executed, by whom and for whom (e.g., for which specific patient) [8].As the event log reflects how a process has been executed in reality, it can support clinicians, healthcare organisation's managers, and other decision-makers with a wide range of process-related questions in the medical domain.
To answer process-related questions, process mining techniques can be of great value.Process mining is a set of techniques used in many domains, including healthcare, to retrieve valuable insights from an event log [7][8][9].A multitude of process mining techniques have been developed in industry and academia [8,10], which enable healthcare stakeholders to identify the actual order of activities in a process [11], to determine the conformance between an existing (e.g., normative) model and reality [12], and to provide insights into the involvement of resources in a process [8,13].
Compared to alternative approaches, such as process mapping exercises with staff members [14], process mining takes data about the real-life behaviour of a process as a starting point.In this way, process mining can support healthcare institutions in achieving each of the quadruple aim for healthcare improvement [15]: (i) improving population health, e.g., by supporting the analysis and improvement of care pathways; (ii) improving patient experience, e.g., by highlighting how a process can be streamlined from the patient's perspective; (iii) reducing costs, e.g., by making bottlenecks explicit; and (iv) improving the work-life balance of healthcare professionals, e.g., by enabling the analysis of resource involvement and requirements in a healthcare process [15,16].Besides supporting the data-driven management and improvement of healthcare processes, process mining also has the potential to support the resilience of the healthcare system by providing detailed insights into how processes are being executed within a particular context [2].
Despite the vast potential of process mining [17], the distinguishing characteristics of the healthcare domain require targeted attention when developing and using process mining techniques.For instance: healthcare processes are typically characterised by high levels of variation [18], and strongly depend on complex decisions made by knowledge workers, such as physicians, which often have a high degree of autonomy [19,20].To date, there is no evidence of a systematic uptake of process mining beyond targeted case studies in a research context [7].To induce a widespread and systematic adoption of process mining in the healthcare domain, targeted methods and techniques that explicitly take the specific domain characteristics into account are required.
Our paper aims to convey a unified perspective on the distinguishing characteristics and associated key challenges of the Process Mining for Healthcare (PM4H) domain.The distinguishing characteristics add to the complexity of using process mining within a healthcare context.The associated challenges need to be tackled by the community, including researchers and practitioners, in order to structurally establish process mining in healthcare as a powerful instrument to support evidencebased process analysis and improvement.Tackling the highlighted challenges requires efforts at the level of fundamental research and translational research.In that sense, this paper may inspire and encourage researchers to contribute to this exciting field.
This paper is an initiative of the Process-Oriented Data Science for Healthcare Alliance 1 , which is affiliated with the IEEE Task Force on Process Mining 2 .The alliance aims to promote research and knowledge sharing to encourage targeted innovation in process mining in healthcare, as well as its effective application in real-life settings.With this paper, the alliance expresses its firm ambition to strengthen the domain's identity by making both its distinguishing characteristics and key challenges explicit.The distinguishing characteristics and challenges have emerged from extensive consultation with a wide range of experts on the topic, for instance during panel discussions at the PODS4H workshop 3 at the International Conference on Business Process Management and the International Conference on Process Mining.
The remainder of this paper is structured as follows.Section 2 introduces the basic concepts of process mining in healthcare.Section 3 outlines ten distinguishing characteristics which are especially relevant for process mining in healthcare, and Section 4 presents ten associated challenges that have to be considered by the community.Table 1 presents an overview of the distinguishing characteristics and challenges in the PM4H domain.Finally, the paper ends with some concluding remarks in Section 5.

Process Mining in Healthcare: Basic Concepts and Applications
This section introduces the key concepts of process mining (Section 2.1).Moreover, some applications of process mining in healthcare are presented (Section 2.2).

Process Mining
Process mining is a family of techniques focused on gaining valuable insights from data that processes generate while being executed.It works as a bridge between process science (which includes areas such as business process management and operations research) and data science (which includes areas such as data mining and predictive analytics), resulting in methods to analyse processes through data [8].Process mining is domain-agnostic, i.e, process mining techniques can be applied in any industry where processes are present and data representing them is available.Healthcare, the focus of this paper, is one domain where the use of process mining is growing [10,21].
Fig. 1 positions process mining in healthcare in a wider context.Process mining centres around processes, which can be represented using a process model such as a model representing the sequence of steps in a process and the different paths a process can take [22].The order of activities in a process can be visualised in various ways, e.g., using flowcharts [23] or Business Process Modeling Notation (BPMN) [24].Fig. 2 exemplifies the use of BPMN to model the trajectory of sepsis patients.Note that the model is purposefully simplified for illustrative purposes.
Nowadays, many healthcare processes areat least partiallysupported by Health Information Systems (HISs).HISs that commonly support healthcare processes are Electronic Health Records (EHR) systems provided by vendors such as Epic4 , Cerner5 , MEDITECH 6 , Allscripts 7 , athenahealth8 , IBM 9 , McKesson10 , and Siemens11 .Such systems record data about the execution of processes in a healthcare organisation [8].This process execution data can be leveraged to create an event log.
Event logs containing process execution data are the primary input for process mining algorithms [8].An event log is composed of cases representing different process instances, e.g. the execution of a treatment process for a specific patient.Each case is composed of a sequence of events, where an event could refer to the completion of a particular activity in the treatment process.As illustrated in Table 2, an event log typically records the following information for each event: (a) an identifier of each case ('Case id'), (b) the activities that each case included ('Activity'), and (c) a reference to when each activity was executed ('Timestamp').Besides this information, an event log can also contain information regarding the type of event ('Transaction type'), the resource associated to an event ('Resource'), as well as other attributes regarding the activity or case.Consequently, the key components of an event log are: • Case id: a process instance contained in the data, which could be a patient being treated in the hospital in a clinical process or an invoice in an organisational process.In Table 2, each case represents a sepsis patient.Cases can be recognized by their unique identifier, e.g.patient 253 and 255.
• Activity: a step of the process, which could be 'Check vital signs' in a clinical process or 'Send payment reminder' in an organisational process.In Table 2, the activities include 'ER Registration', representing the action of registering the patient in the emergency ward, and 'CRP', representing the registration of c-reactive protein measurement in the system.• Timestamp: time at which the event took place.In Table 2, the first event took place on April 14th, 2021 at 11:33:50.• Transaction type: the state when the event was recorded.In Table 2, all the events were recorded once they were completed ('complete' state).Other possible states are 'start', 'resume' or 'schedule', among others.• Resource: refers to the resource associated to the event.This could refer, for instance, to a physician, a healthcare professional or medical device.In Table 2, the resource information tells us, for example, that 'Nurse 1' has registered the triage information of patient 253, while 'Physician 02' recorded the admission of patient 253 to the normal care ward.• Other attributes: additional case or event attributes may also be recorded in an event log, such as the hospital unit where the patient has received care, the patient's vital signs, etc.
Event log creation in healthcare environments faces considerable challenges due to heterogeneous data sources (e.g., mobile health data) and distributed healthcare providers [27].Moreover, it is recognised that some healthcare processes are not directly supported by HISs, i.e. unplugged processes [28].These type of processes require novel methods to obtain an event log, such as the video tagging method to create process data from surgical procedure recordings [29].Medical vocabularies and ontologies are also important assets that can be helpful when creating event logs in healthcare environments [27].
Using the event log, various process mining types can be performed in order to generate valuable process-related insights.Three prominent types of process mining are [8]: • Discovery: these algorithms are useful to obtain process models reflecting process behavior from an event log [8].Many algorithms focus on discovering the order of activities in the process (a.k.a.control-flow, trajectories, activity paths, and care pathways) from the data.Besides the large number of control-flow discovery algorithms, other discovery algorithms help to gain knowledge about how resources work throughout the process, focusing on role discovery [30,31], social networks [32,33], and task prioritisation patterns [34].Examples of use cases include the discovery of models of gynecological oncology processes showing relations between the organisational units involved and therapeutics pathways that patients received [35,36], or the discovery of collaboration patterns discovered between the physician, nurse and dietitian involved in diabetes treatment [37].Fig. 3 illustrates the discovered trajectories of sepsis patients in a hospital.• Conformance checking: these algorithms require a process model, either obtained by means of a discovery algorithm or previously designed, and aim to compare the behaviour in the event log with the behaviour in that process model [12,36].Hence, conformance checking algorithms help to evaluate whether the process is being executed as described in the process model, as well as to detect deviations between the observed behaviour in the event log and the process model.For instance: Huang et al. [9] use conformance checking to detect deviations from the unstable angina pathway.The analysis highlighted the presence of unexpected, early, delayed and absent activities [9].• Enhancement: these algorithms help to enrich and extend an existing process model using process data [8].An enhancement type is model repair, which allows the modification of a process model based on event logs [38].Another type is model extension, where information is added to enrich a process model with information such as time and roles.Some examples are the repair of declarative process models representing clinical practice guidelines in an urology department [39], and the model extension of a process model with the duration of surgical procedure stages based on Robot-Assisted Partial Nephrectomy (RAPN) data [40].
To facilitate the application of process mining in practice, various software tools have been developed, including commercial software such as Disco 12 , Celonis13 and Apromore 14 .There are also open-source alternatives, such as ProM 15 , and libraries for programming languages, such as pm4py 16 for Python and bupaR 17 for R.

Applications of Process Mining in Healthcare
Over the last years, process mining has been used for various use cases in healthcare, which has been demonstrated in various literature reviews on this topic [10,21,41].While Section 2.1 present various process mining examples, this subsection highlights some of the most relevant healthcare questions that process mining can answer.
Process variant exploration for clinical pathway analysis is a frequently recurring use case for PM4H [21].These techniques study the difference between several variants of the execution of the same clinical pathway, which could be approached from a control-flow or performance perspective [42].Clinical pathways are highly flexible as all Fig. 1.Positioning of process mining in healthcare, based on [8].
patients in need of the same treatment come with different comorbidities and complications, involve complex decision-making due to its knowledge-intensive nature, are performed by a network of specialists, and continuously evolve due to innovations and unforeseen situations [43].Identifying differences between groups of pathway executions through process variant analysis helps to decide whether process improvement is needed, and if so, which changes can make the process more efficient [44].An example of this analysis is Caron et al. [45], where data of 1143 gynecologic oncology patients were analysed in two subsets: one with patients receiving radiotherapy and the other with patients receiving chemotherapy.There are various challenges related to process variant analysis, including the comparison of process variants from the resource perspective, the verification of guideline compliance, the discovery of how adverse events are faced, the analysis of process variants across the whole patient journey (including prevention, pre-hospital care, hospital treatment and rehabilitation), and the identification of useful variants for process improvement [42][43][44][45][46].
Process mining also has applications for disease trajectory modeling, which refers to models characterising the progress of a disease over time and compare the disease evolution depending on patient attributes such as the age, co-morbidities and received treatments received of a patient [47,48].An example is the study conducted by De Oliveira et al. [49], where data of 76.523 sepsis patients was used to uncover the most common diagnostics that patients received prior and after the diagnosis to understand better how to identify and manage sepsis.Using process mining, they developed a bow-tie visualisation, which allowed them to discover that pneumonia and gastrointestinal disorders commonly occurred before sepsis, while septicaemia, pneumonia and urinary tract infections occurred after sepsis.Challenges in disease trajectory modeling using process mining are the development of models which are easy to understand (i.e.clear visualisations of the trajectories), and the comparison of these models with clinical guidelines using conformance checking [48].
The aforementioned use cases only aim to illustrate the opportunities that process mining offers to healthcare.There are various other highly relevant questions for which process mining can generate relevant Fig. 2. Process model representing the sepsis patients trajectory, based on Mannhardt [25].The model was created using BPMN as a process modelling language.The start event (○) indicates the start point of the process, and the end event (circle at the end of the model) indicates the end point of the process.The gateways represent alternative paths: the parallel gateway ( ) means that all the paths should be followed, and the exclusive gateway ( ) means that only one path .canbe followed.
insights.These questions include: • How does the process flow of patients with a particular medical complication differ from other patients?Every patient is unique, which implies that patients with the same illness respond differently to the same treatment due to comorbidities and other contextual factors [50].Variations in the patient trajectory can be discovered with process mining algorithms, which help to characterise groups of similar patients (in terms of medical history, laboratory tests, etc.), allowing healthcare professionals to gain profound insights in the treatment trajectory of various patient types.
• To which extent is the care pathway for a particular medical condition followed in practice?With the rise of evidence-based medicine, protocols and clinical guidelines are developed to provide clarity in the necessary steps when diagnosing and treating a medical condition [51].However, it is difficult to determine the implementation and effectiveness of clinical protocols and guidelines in reality, i.e. whether they are followed in practice.Process mining allows practitioners and researchers to perform this type of analysis, which can help to understand major deviations from clinical guidelines, as well as to identify areas for improvement in clinical guidelines and protocols.
• Where are the bottlenecks in a healthcare process?
Time is often an important variable in healthcare.Process mining makes it possible to analyse the time perspective of processes through indicators such as waiting times and activity duration, which together help to detect bottlenecks in a process.Having this information on healthcare processes, such as those in an emergency department, can drive decision-making to, for example, improve the availability of boxes and reduce waiting times [52].

• How do multiple clinical experts interact in a care process?
Collaboration between clinicians and other healthcare staff is daily practice in healthcare [53].Hence, when analysing a care process, various healthcare professionals are likely to be involved when treating a disease.Process mining provides tools to analyse collaboration patterns among healthcare professionals within a process, e. g. by identifying handovers of work [54].
These questions illustrate that PM4H can support healthcare professionals in answering a wide variety of process-related questions.Against this background, the next section will outline distinguishing characteristics of healthcare processes.Afterwards, key challenges for the PM4H community are discussed.

Distinguishing Characteristics
This section outlines ten distinguishing characteristics of healthcare processes, which have implications for PM4H.While we do not claim that these characteristics are exclusive to the healthcare domain, we consider them as highly relevant for the use of process mining in a healthcare context.The distinguishing characteristics are discussed separately in the remainder of this section, but, in practice, they are also interconnected, adding to the complexity required to take them into consideration.Moreover, the distinguishing characteristics give rise to specific challenges, which need to be taken into account when developing process mining techniques.

D1: Exhibit Substantial Variability
Healthcare processes are complex, in part because they tend to exhibit significant variability [4,18,55].Several factors contribute to this intrinsic presence of variability in healthcare processes.These factors include the vast diversity of activities that can typically be executed, the fact that several subprocesses can be executed simultaneously (e.g., in case of polytrauma), and the influence of differences in the personal preferences/characteristics of patients, clinicians and other healthcare professionals (e.g., impacting choices made in the treatment process) [18,56].The combination of such factors tends to make almost all cases (e.g., a patient in a clinical process) different.For instance: given the patient's pathologies and co-morbidities, a different set of activities might need to be executed in comparison with the standard pathway.Moreover, patients can respond very differently to particular treatments, which affects the order or type of activities that follow.It should also not be forgotten that the patient is the ultimate decision maker, who may accept or decline a particular treatment according to beliefs, fears or perceptions regarding quality of life.
When an event log of a highly variable healthcare process is used to discover a control-flow model, control-flow discovery algorithms are likely to generate an unstructured model, often referred to as a spaghetti model [8].Classic process mining techniques are not well prepared to deal with unstructured processes and, as a consequence, generate process models which are extremely challenging to interpret.A common approach to deal with this issue is to remove or reduce the variability in the event log by means of abstraction techniques such as filtering or aggregation, e.g., using trace clustering techniques [57] and semantic aggregation of activities [58,59].However, this approach generates process models that only cover a small part of the problem at hand.Such approaches might not be sufficient for many real-world healthcare applications because they only provide a partial view of the process and may hide valuable infrequent behaviour.Hence, PM4H researchers should be aware of the variability issue when providing solutions, tools and frameworks to understand and deal with this variability.

D2: Value the Infrequent Behaviour
While infrequent behaviour could be considered as noise in a general scenario, it can be a source of valuable knowledge in the healthcare domain.Healthcare is known for being especially prone to workarounds, i.e., intentional deviations from prescribed practices [60].Therefore, infrequent behaviour typically needs to be considered in PM4H.For example, nurses must check the vital signs of a patient before a consultation with a physician, and should immediately register the scores in the HIS.However, an analysis of the process might show that nurses keep track of the scores on a notepad and insert all the information later in their shifts [61,62].Such workarounds provide insights in common inefficiencies and obstacles that healthcare actors face in their daily work.These insights provide a basis for a thorough analysis, enabling healthcare organisations to improve their processes [63].
In the aforementioned example, a shortage of computers and significant time pressure before the consultation with a physician could explain workarounds, providing valuable input for improvement actions [61].
Workarounds can also highlight that different paths through a treatment process lead to the same outcome, providing information about relevant treatment variations to treat a particular disease [64].
It follows that PM4H researchers and practitioners must go beyond simply filtering out infrequent behaviour from the event log.In contrast, they should try to understand why infrequent behaviour is observed and what this could mean.They should be aware that focusing on models of the typical execution of the process can result in blind spots, causing them to miss important opportunities for process improvement.

D3: Use Guidelines and Protocols
Over the past decades, evidence-based medicine has emerged as a process in which researchers and practitioners in the field of medicine continuously search for unbiased clinically-relevant information [51].The evidence-based medicine paradigm has been accompanied by an increasing presence of clinical practice guidelines and protocols [65,66].Clinical practice guidelines are systematically structured mechanisms which assist practitioners in determining the appropriate healthcare procedures for specific clinical circumstances [67] and, hence, constitute reference processes.The abundant presence of guidelines and protocols provides rich opportunities for process mining in the healthcare domain, compared to other domains where reference processes are absent [68].These opportunities are twofold.Firstly, guidelines and protocols can facilitate the application of process mining.For instance, the available guidelines can be seen as prescriptive models to which the actual execution of the process can be compared using conformance checking algorithms [69] and similarity-based techniques [70][71][72][73].In this way, PM4H practitioners can use the structured information provided by guidelines to select relevant events during event log generation [27], and to formally define the activitieswith the associated activity labelsthat will be considered by process discovery algorithms when generating the models [74,75].Secondly, process mining can provide Fig. 3. Process map discovered using the Disco software and the sepsis patients data [26].
the evidence required to improve guidelines and protocols by assessing their effectiveness and efficiency.Similarly, this generated evidence can be used to customise existing global standardisation efforts to the specific characteristics of local contexts in an affordable and effective way [76].
In summary, PM4H researchers and practitioners should be aware of the existing guidelines and protocols in their specific setting, such that they can be maximally leveraged during the preparation and analysis of an event log.

D4: Break the Glass
'D3: Use Guidelines and Protocols' highlights that standardised guidelines and protocols aim to establish high-level structure for healthcare processes.However, following the metaphor of a fire alarm, physicians and healthcare professionals might need to break the glass and deviate from the protocols that are in place.Such situations can occur both at the level of an individual patient and at the level of the system.At the patient level, alternative courses of action might need to be taken due to previously unknown co-morbidities, unexpected complications, patient preferences, or because certain (combinations of) co-morbidities are not covered by the existing guidelines [77][78][79].At the system level, emergency situations might follow from a sudden surge in the number of incoming patients or an unforeseen reduction in staff availability [78].Thus, healthcare processes deal with unplanned situations due to unforeseen emergencies or the optimization of the limited resources available.
When physicians and healthcare professionals react to unforeseen or emergency situations, this is, of course, also reflected in the data.This behaviour gives rise to unexpected patterns in the data, which stresses the importance of considering contextual information when conducting process mining analyses in healthcare [80].Contextual information, such as patient characteristics or the state of the system at a particular point in time, can be essential to explain patterns observed in the data.This understanding can enable a PM4H researcher to systematically analyse the desirability of deviations from protocols, taking into account rich contextual information.Hence, considering the context can highlight the need to fine-tune protocols or provide input for developing future policies of a healthcare institution.
In sum, PM4H researchers and practitioners need to be aware of the existence of break the glass situations.Moreover, building upon the available contextual information, they can try to identify and understand such situations.

D5: Consider Data at Multiple Abstraction Levels
It is a common misconception that PM4H only uses medical treatment data and, hence, is a synonym for medical treatment analysis.Medical treatment analysis is, without doubt, an important use case for PM4H [10], with a multitude of examples being reported in medical domains such as oncology [81], cardiology [82], primary care [83], and frail elderly care [84].However, a wide spectrum of different types of data is available in the healthcare domain, both related to clinical processes as well as to organisational/administrative processes [6,7,85].
A key distinction can be made between high-level data and low-level data.Low-level data are very fine-grained data, which is recorded by medical equipment or sensors at healthcare facilities.Typically, some form of aggregation is required to retrieve meaningful patterns from large volumes of fine-grained data [86].For instance, event data recorded by medical equipment, such as an Allura Xper X-ray system, can be used to construct realistic test profiles for fault diagnosis to identify new problems before they actually emerge [87].Another source of low-level data are technologies such as wearable devices and Real-Time Localisation Systems.The deployment of these technologies by healthcare organisations has experienced a remarkable increase in recent years, and their data has been successfully used as input for process mining to analyse patient pathways [88], and to gain insights in personalised chronic disease management [89].In surgical procedures, even when surgical robots have not been used, alternative sources of low-level datasuch as video recordingscan be used to identify activities, in order to analyse a particular procedure [28,29].
High-level data, which are more coarse, typically allow obtaining more meaningful patterns during the analysis without the need for aggregation.Data that is typically recorded in a hospital information system or administrative data often have a more high-level character.Due to this high-level character, such data might be suitable for crossorganisational comparisons, e.g., to share best practices amongst healthcare organisations [90].Knowledge sharing regarding processes can be especially relevant in situations such as the COVID-19 crisis [2].Consequently, cross-organisational comparisons can be of interest for the PM4H discipline.
Given the multitude of high-level and low-level data sources available in a typical healthcare organisation, a PM4H researcher and practitioner is likely to be confronted with different types of data.Such data include logistics, billing, accounting, staff, and payroll information, and can be used to answer important questions in topics such as shift management, bed management, patient admission, transfer and diversion policies [91].Moreover, as different stakeholders have different information needs, a PM4H research question under analysis might require the combination of very distinct data sources.

D6: Involve a Multidisciplinary Team
Healthcare processes are increasingly having a multidisciplinary character [37].Similarly, data science is, in its nature, multidisciplinary [92], and process mining is no exception [8].Any type of process analysis could possibly include techniques from other computer science disciplines, such as artificial intelligence, machine learning, and computer vision [93].
Besides the potential involvement of expertise from the aforementioned computer science fields, PM4H researchers and practitioners should always recognise the multidisciplinary nature of healthcare processes, necessitating the involvement of extensive expertise from the healthcare domain.In order to ensure the relevance of PM4H techniques, the involvement of physicians, nurses and other healthcare professionals from all relevant departments is essential.From a practical angle, a multidisciplinary team should be composed, which is closely involved in all stages of the process mining effort: data collection, data analysis, data interpretation, the communication of its results and its translation to practical actions.
When working in this multidisciplinary context, PM4H researchers and practitioners should take the following considerations into account.Firstly, the team should consider other clinicians, besides physicians.As an example, many insightful PM4H studies in the literature demonstrate a deep involvement of nurses in the generation of their findings [61].Secondly, PM4H projects must use the appropriate medical language, terminology, codes and customs to ensure mutual understanding in a multidisciplinary context.In this regard, a broad range of standards and ontologies on medical concepts, such as medications, procedures, and diagnoses, already exist.Examples of standardised terminologies are well established clinical ontologies such as SNOMED CT and ICD-10 [94].

D7: Focus on the Patient
Many actors are involved in a typical healthcare process.These include patients, the patients' relatives, physicians, nurses, other healthcare professionals and supporting staff.As mentioned in 'D6: Involve a Multidisciplinary Team', different staff categories should be involved in a process mining project team.However, it needs to be recognised that the patient is at the heart of all healthcare processes.Hence, PM4H researchers should always make sure to look at a process from the patient's perspective, even when they might not be explicitly represented in the project team.This point implies that the patient perspective should receive explicit attention when developing methods, tools, and frameworks.In this way, PM4H can support healthcare organisations into providing patient-centred care, a key indicator for care quality [95,96].
When considering the journey of a patient with a particular medical condition, it should be recognised that the patient typically crosses the boundaries of several healthcare organisations.Besides the hospital, a patient might also visit other professionals such as a general practitioner and a paramedic.PM4H researchers need to take this crossorganisational character of the patient journey into account as data will also be spread over various organisations [7].Even when considering a process within a single healthcare organisation, a patientcentered approach requires a specific mindset as process mining analyses typically focus on gaining insights into processes at the system level.In contrast, clinicians tend to focus on care for individual patients, which can have implications on the performance of the system as a whole.For instance: when a physician wants to completely finish a procedure for one patient before assessing new patients, this might lead to increasing waiting times at the system level.This example highlights that, next to the system perspective, PM4H research should also be aware of the individual patient when studying processes.This analysis involves both the clinical perspective as well as the patient experience perspective.

D8: Think about White-box Approaches
Physicians face difficult situations and decisions in their daily routine.Despite their extensive training, decisions on the required diagnostics and treatments for a patient will rely on a risk-benefit assessment by a physician, which will be context-dependent.The unprecedented advances in artificial intelligence and machine learning have fostered new decision support systems delivering accurate information to support physicians when making clinical decisions.However, one of the biggest challenges is that physicians are reluctant to trust and adopt recommendations by a system that they do not fully understand, referring to it as a black box [97].
To enable the data-driven improvement of healthcare processes, there is a need for white box approaches that enable healthcare actors to understand the origins of particular observations.Process mining techniques are perceived as white boxes [98], since their final goal is not to provide a categorical answer, but to provide healthcare actors with techniques to get a better understanding of what is going on in their processes.The strength of process mining lies in its focus on understandability.Therefore, PM4H researchers and practitioners should be aware of the critical importance of understandability.In that respect, they should be prepared to use novel visualisation techniques and interactive models that advance the field in the direction of providing healthcare professionals with the insights they need in an understandable way [99].Moreover, there are decisions that cannot always be made effectively based on few parameters and constructs (from simple switch-case patterns to complex DMN features).Those decision points can benefit from more understandable formalisms, such as Computer-Interpretable Clinical Guidelines (CIGs) [78] allows the enactment evidence-based recommendations intended to optimize patient care [100].Finally, PM4H researchers and practitioners need to recognise that approaches that recommend and explain various courses of action are more likely to be accepted than counterparts that 'enforce' a particular one [101].

D9: Generate Sensitive and Low-Quality Data
When performing process mining in a healthcare context, in particular in a clinical setting, data sensitivity needs to be taken into account.The data at hand might include information such as a patient's current medical condition, co-morbidities, treatments, etc.As a consequence, healthcare-related event logs must be handled carefully.Healthcare data are well-known for being sensitive because of its confidentiality, and its usage, storage, and transfer is strictly regulated by institutions, countries, and even international treaties [102].Therefore, as responsible citizens, PM4H practitioners must consider ethics in general and data privacy, either when conducting a primary (e.g., to improve patients outcomes) or secondary use (e.g., to improve health services outcomes) of the data [103].
Next to data privacy, poor data quality is also an important issue within the healthcare domain.In many countries, healthcare processes are still paper-based to a certain extent, which presents challenges to attach precise timestamps to activities that have been conducted.Even though integrated HISs are becoming more pervasive, this does not guarantee high-quality data [104].As shown in several existing works, healthcare processes tend to be characterised by poor quality data [62,105,106].A variety of data quality issues can be identified, including missing events, imprecise timestamps, and incorrect timestamps [5,62].The presence of data quality problems can be attributed, at least partly, to the fact that event recording often still requires a manual action from clinicians or administrative staff.When, for instance, an activity is not recorded in the HISs at the time that it was performed, the timestamp of the associated event will not correspond with the activity execution time.Moreover, clinicians tend to develop their own habits in terms of system registrations, e.g. based on their personal interpretation of the situation and the registration options provided by the system.The latter can give rise to different registration patterns across clinicians involved in the same process.
From the previous, it follows that PM4H is intrinsically connected to the need for explicit attention to data privacy and data quality.Hence, PM4H researchers and practitioners need to take both of these concepts carefully into account when they have the ambition to support the management and improvement of real-life healthcare processes.

D10: Handle Rapid Evolutions and New Paradigms
Healthcare is a domain that exhibits rapid and continuous evolutions, which also has implications for healthcare processes.A prime example is the change in healthcare processes due to the accumulation of knowledge from clinical research [7,51,65,66].This principle of evidence-based medicine is based on the critical appraisal of various types of scientific studies such as randomized controlled trials and cohort studies.High-quality research findings are integrated in clinical practice, which can induce changes in diagnostic or treatment processes [107,108].In a similar vein, rapid evolutions in technology also impact processes in healthcare.For instance: mobile health solutions present various opportunities to reshape healthcare processes.This relates, amongst others, to remote monitoring, which enables healthcare professionals to follow-up specific clinical parameters remotely [109].
Healthcare does not only evolve due to new clinical knowledge or technological advances, but also due to the emergence of novel paradigms.For instance, the patient-centered care paradigm implies that care should carefully consider the needs, values and preferences of each individual patient [110].Within this context, patient-reported measures are increasingly gaining attention.In this respect, a distinction is made between patient-reported outcome measures (PROMs), focusing on health aspects such as symptoms and treatment side effects, and patientreported experience measures (PREMs), centering around experiences when receiving care [111].Both PROMs and PREMs constitute highly valuable output to evaluate and redesign healthcare processes.
In order to provide valuable contributions to the healthcare domain, the PM4H community should be able to handle rapid evolutions and new paradigms.This requires a permanent awareness of new trends and innovations, as well as a careful assessment of their impact on healthcare processes and, hence, PM4H.Advances in the healthcare domain might give rise to new information needs, on which PM4H researchers and practitioners should anticipate.

Challenges
The previous section outlined the distinguishing characteristics of healthcare processes.These characteristics give rise to challenges that need to be studied by the PM4H communityboth researchers and practitionersto structurally embed process mining in the healthcare domain as an instrument to support evidence-based process analysis and continuous improvement.At the research level, the outlined challenges will require fundamental and translational research, where the latter is needed to support the actual uptake of fundamental research in healthcare practice.The challenges are meant to guide both aspiring and established PM4H researchers in their search for relevant research endeavors.Similar to the distinguishing characteristics, we do not claim that the challenges are exclusive to the healthcare domain.However, we consider them particularly relevant to move PM4H forward in the future.
The remainder of this section will outline the key challenges for PM4H.Each subsection's title will also link the challenge to one or more of the distinguishing characteristics by using the labels introduced in Section 3.

C1: Design Dedicated/Tailored Methodologies and Frameworks (D1 -D9)
Given the distinguishing characteristics outlined in Section 3, the domain needs novel PM4H methodologies and frameworks, which guide researchers and practitioners through the various phases of a PM4H analysis.Such an analysis ranges from the identification of the research problem, through the composition of the event log (taking into account considerations regarding privacy and security), to the actual analysis and interpretation of the results [7].New methodologies and frameworks should remain flexible, such that they can be adapted to the specific characteristics of a country, institution, department, process, clinician, or even the patient involved.
General process mining methodologies, such as L* [8], PM 2 [112] and Aguirre et al. [113], were key factors in the rising popularity of process mining, since they opened the door for researchers of other disciplines to apply process mining to a wide range of contexts [114].In an analogous way, healthcare-specific methodologies for PM4H will allow healthcare actors to incorporate process mining into their analysis [115][116][117].Moreover, such frameworks may allow PM4H techniques to be reused in different contexts, providing the means for a fair comparison among techniques across different scenarios [118].Frameworks for cross-organisational [90] and cross-national [75] studies are also an interesting research line to advance the PM4H state-of-the-art, as well as the generation of methodologies to conduct meta-analyses based on outcomes and metrics obtained with PM4H techniques [119].

C2: Discover Beyond Discovery (D3, D5)
As highlighted in Section 2, process mining techniques are commonly classified into three types: discovery, conformance, and enhancement [8].However, the evolution of these three types of techniques over time has not been the same [120].Initially, most of the process mining techniques were discovery techniques, making it the dominant research stream in the early days of process mining.In later years, conformance checking has gradually started gaining prominence, and more recently, the number of novel enhancement techniques are starting to rise significantly.Recently, there has also been a surge of novel techniques, such as event and trace abstraction techniques [86], simulation [121], and predictive process monitoring [122].
PM4H manifests a similar evolution, where discovery is the most dominant use case in published PM4H literature [10,[81][82][83][84]41].Although works that focus on discovery are highly relevant, novel conformance checking and enhancement techniques, tailored towards healthcare, are needed.Some seminal works have been proposed that explore conformance for healthcare [123][124][125].However, they are still in their infancy, making significant potential for future growth, especially because conformance checking approaches can be used to evaluate the various guidelines and protocols that exist in healthcare ('D3: Use Guidelines and Protocols').Similarly, some enhancement techniques have been proposed for healthcare, but there is still a lot to explore [126], especially regarding highly relevant process perspectives for healthcare such as the time perspective [52,127] and the resource perspective [30].These two perspectives appear to be particularly promising since they can support the long term resource planning and service (re-) design of healthcare organisations [128].Process mining can also help to build advanced simulation and forecasting models for the operations of healthcare departments (e.g. the emergency department) or for diagnostic-therapeutic pathways (e.g.patients suffering from breast cancer) [115,129].

C3: Mind the Concept Drift (D1, D3, D4)
Clinical practice guidelines and protocols tend to change over time, e.g., due to advances in clinical research [7,51,65,66].Even without changes to guidelines and protocols, the execution of healthcare processes could change, for instance, due to seasonal factors, to cope with the influx of patients in a hospital during winter months, or because an alternative way of working is introduced.Gaining insights into such dynamic processes is challenging and can benefit from techniques to detect changes in the process and determine their effects.
To this end, research on concept drift, which refers to the phenomenon whereby processes change over time, is essential [130][131][132].The evolution of the COVID-19 treatment process over time is an obvious illustration of this phenomenon [133].Moreover, distinguishing characteristics such as the high variability of healthcare processes also make it challenging to identify and study change patterns.While these aspects show the need for further dedicated research efforts, concept drift should systematically be taken into account in PM4H.

C4: Deal with Reality (D2, D5, D9)
'In theory there is no difference between theory and practice, while in practice there is'.This proverb seems to have particular significance for PM4H.Whenever a novel PM4H technique is being developed, thorough testing and evaluation is a key element [118,134].Similar to any general process mining technique, synthetic data can be used during the development of a PM4H technique in order to evaluate its performance in a controlled environment.However, PM4H research has the goal to generate research with societal impact and, hence, research should focus on novel approaches able to handle real-life healthcare data.Moreover, the techniques should be able to handle large amounts of data and cope with input data containing significant variability.Carefully considering real-life healthcare data is crucial as it can significantly influence the premises and design of the PM4H domain.When a researcher does not have access to real-life data due to the absence of partnerships with healthcare institutions, publicly available healthcare datasets can be used instead 18 .In this respect, community efforts can also be made in the direction of providing and maintaining extensively documented, publicly accessible datasets to PM4H researchers and practitioners.

C5: Do It Yourself (DIY) (D6, D8)
As outlined in 'D6: Involve a Multidisciplinary Team', PM4H is a multidisciplinary domain that requires the involvement of physicians, nurses, and other healthcare professionals.To support a widespread use of process mining, healthcare professionals should be able to perform their analyses with little or no assistance of process mining experts.This direct involvement of healthcare actors has several implications that need to be taken into account when PM4H researchers propose new techniques.Firstly, during technique development, healthcare actors should be targeted as end users.Therefore, techniques should not require extensive expertise in terms of process mining in order to be used correctly.Secondly, the output of techniques should be understandable for healthcare actors.For instance, in case of control-flow discovery, the process modelling notation used to visualise the output should be appropriate for non-process mining experts from the healthcare domain [7].To that end, simple visualisations, such as Directly-Follows Graphs (DFGs) [135] and Business Process modelling and Notation (BPMN) diagrams [136,137], may be effective.However, when using notations without formal semantics, it is important to consider their well-known limitations, such as the risk for misleading statistics and ambiguous visualizations [138].Finally, all tools or software developed in PM4H research should be user-friendly, with specific attention to terminology, human-computer interaction, and visualisations.

C6: Pay Attention to Data Quality (D9)
The 'garbage in -garbage out' principle also holds for PM4H, implying that the quality of all analyses ultimately depends upon the quality of the data used as input.As mentioned in 'D9: Generate Sensitive and Low-Quality Data', real-life data from HISs tends to suffer from data quality issues, which is troublesome for the use of these data for process mining purposes.The widespread presence of data quality issues stresses the need for the development of techniques to systematically assess and improve the quality of healthcare event logs.Recent approaches that have been developed, for example [62,[139][140][141][142], demonstrate the relevance of this research topic.A recent overview on event log quality in PM4H is presented in Martin [106].While existing approaches typically focus on data quality in the context of an existing event log, it should be recognised that data quality issues are also related to data management and extraction.This is because stakeholders typically have difficulties extracting this data [143].
Given the direct impact of data quality issues on the outcomes of process mining and the high prevalence of such issues in a healthcare context, it is a challenge to raise awareness about the topic within the healthcare sector [104].Key players, such as hospital managers and physicians, should consider the impact of data quality issues on potential process mining analyses.Besides improving awareness, techniques to support the (interactive) identification of data quality issues, as well as their rectification (if possible) are valuable [144].Moreover, analysis outcomes should also contain a reflection on the quality of the underlying data [7].When interpreting the results, expressing the uncertainty degree of process mining outcomes can reflect the required level of caution on the part of healthcare actors.This aspect is especially important when process mining insights will be used within the context of clinical decision-making.

C7: Take Care of Privacy and Security (D9)
PM4H, especially when used in a clinical context, builds upon sensitive data regarding patients.Adequately safeguarding data privacy and security is of key importance for PM4H, which include responsible data science aspects such as fairness (avoiding prejudice) and confidentiality (not revealing sensitive information) [145].In the broader process mining community, the importance of data privacy and security has already been recognised in the Process Mining Manifesto [146].However, actual research contributions focusing on privacy-preserving techniques targeted to process-related data only started to appear recently [147,148].It still remains difficult to balance safeguarding privacy on the one hand and preserving the value of the event log for process mining purposes on the other hand [102].
While anonymisation and other privacy-preserving techniques can support healthcare organisations when creating an event log suitable for data exchange (e.g., between a hospital and research partners), PM4H could also explore alternative modus operandi to enable the collaboration between researchers and healthcare organisations limiting the need for data exchange.Establishing methods to efficiently set up such collaborations can move PM4H forward, as it reduces the risk of data breaches.This point is especially relevant in times where data privacy and security concerns tend to make healthcare organisations more reluctant to share data for research purposes.

C8: Look at the Process through the Patient's Eyes (D7)
Within the context of a process mining project, information needs can differ depending on the involved stakeholder.As mentioned earlier, potential stakeholders in the healthcare domain include hospital managers, physicians, nurses, patients, and their relatives [7].Currently, many process mining initiatives in healthcare target hospital management, heads of department or clinicians and often aim to provide a highlevel overview of process behaviour.However, as reflected by particularity 'D7: Focus on the Patient', all processes directly or indirectly focus on the patient.
Against this background, there is a need for studying healthcare processes through the patient's eyes: what is the patient's journey while being diagnosed or treated for a particular (potentially chronic) condition?Studying a process from the patient's perspective can help physicians to consider the full patient journey when making decisions at a particular point in time.Even when adopting the patient's perspective does not directly result in options to improve the clinical or administrative outcomes, it might bring ways to the uncover on how to enhance the patient's experience [149,150].Looking at the process from the patient's eyes also should consider the combination of data from various departments, and even different healthcare institutions [7].In this way, process mining analyses will move from a single department/institution perspective to a cross department/institution perspective.This point of view constitutes a complementary perspective to existing process mining research in healthcare.

C9: Complement HISs with the Process Perspective (D2,D4,D5)
As important as focusing on the data and their uses for process mining, PM4H research must also focus on the source of the data, the Health Information Systems (HISs), which can include from more traditional hospital information systems [151,152] to more integrated information systems [153].Such systems could benefit from being complemented with the process perspective, giving their processes a more predominant role, and interacting with process awareness systems to support both the operations of the organisation and the processes involved [154].This combination will open the door for process mining to many of the benefits of systems with a strong process perspective -such as Process-Aware Information Systems (PAISs) [155]-, but will also require research and study of the inherent problems and future directions of both HISs [156] and their process awareness combination [157].This study must include the understanding of how HISs must integrate the different healthcare data sources ('D5: Consider Data at Multiple Abstraction Levels'), and handle unexpected behaviour when an unpredictable event occurs ('D4: Break the Glass') or when a deadline will be violated [158] ('D2: Value the Infrequent Behaviour'), even being able to perform immediate decision making while analyzing the data in real time (e.g., streaming process mining) [159].

C10: Evolve in symbiosis with the developments in the healthcare domain (D1, D3, D7, D10)
Healthcare is a domain that continuously develops due to innovations in, amongst others, medicines, medical procedures and technologies.In order to support healthcare professionals on an ongoing basis, PM4H methods should evolve in symbiosis with the developments in healthcare.For instance: treatment processes are rapidly being adapted based on emerging scientific evidence [2].A clear illustration is the COVID-19 treatment process, which rapidly changed due to the accumulation of new insights in the virus and its treatment [133,160].PM4H should be able to cope with these circumstances.Moreover, PM4H can support the adaptation of treatment processes by, e.g., enabling the efficient comparison of processes with respect to the clinical outcomes they generate.
Another important evolution in healthcare is the prominent presence of personalised medicine, which implies that medical treatments are increasingly tailored towards the needs of each individual patient [50,161].In order to support personalised medicine, PM4H can develop techniques to efficiently assess the suitability of a particular treatment process for a patient with a specific profile.For instance, Valero-Ramon et al. [89] recently proposed an approach to discover dynamic risk models for patients suffering from chronic diseases based on sensor data.These models can be leveraged to customise treatments based on a patient's unique behaviour [89].When process data can be enriched with outcome and cost data, PM4H also has the potential to study the effect of personalised treatment processes compared to standard practice in detail.

Concluding remarks
The goal of this paper is to support the PM4H domain by conveying a shared perspective regarding the distinguishing characteristics of healthcare processes and the associated key challenges of PM4H.In particular, ten distinguishing characteristics and ten challenges are outlined, the latter requiring specific attention of the PM4H community.In order to tackle the key challenges, close collaboration is required between experts from various domains.In that respect, the symbiosis between data-and process-related expertise on the one hand and clinical expertise on the other is essential.Within a research context, close interaction is required between research fields including medicine, medical informatics, computer science, and business process management.To make PM4H flourish, researchers from these various fields need to join forces in a spirit of co-creation.In this way, they may be able to develop innovative process mining methods that maximally support clinicians and decision-makers to manage and improve real-world healthcare processes in an evidence-based way.
While this paper shapes the identity of PM4H, there is no intention to position PM4H as an isolated domain.On the contrary: as a relatively young research field, the PM4H community should actively connect to and learn from more mature fields such as artificial intelligence, data mining and machine learning [162,163].The strengths of PM4H will be based on intelligently combining insights from various research areas.This intersection of disciplines will increasingly involve research on big data, the Internet of Things and deep learning [164][165][166].Translating concepts and techniques from various other research fields to the PM4H setting will present continuous research challenges.Hence, there is much to be done and much to be achieved.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Table 1
Distinguishing characteristics and challenges of process mining in healthcare.