A health equity monitoring framework based on process mining

In the United States, there is a proposal to link hospital Medicare payments with health equity measures, signaling a need to precisely measure equity in healthcare delivery. Despite significant research demonstrating disparities in health care outcomes and access, there is a noticeable gap in tools available to assess health equity across various health conditions and treatments. The available tools often focus on a single area of patient care, such as medication delivery, but fail to examine the entire health care process. The objective of this study is to propose a process mining framework to provide a comprehensive view of health equity. Using event logs which track all actions during patient care, this method allows us to look at disparities in single and multiple treatment steps, but also in the broader strategy of treatment delivery. We have applied this framework to the management of patients with sepsis in the Intensive Care Unit (ICU), focusing on sex and English language proficiency. We found no significant differences between treatments of male and female patients. However, for patients who don’t speak English, there was a notable delay in starting their treatment, even though their illness was just as severe and subsequent treatments were similar. This framework subsumes existing individual approaches to measure health inequities and offers a comprehensive approach to pinpoint and delve into healthcare disparities, providing a valuable tool for research and policy-making aiming at more equitable healthcare.


Introduction
Legislation in the United States (US) has recently proposed tying Medicare reimbursement to the health equity of the treatments provided [1] and including health equity standards into the hospital accreditation program of the joint commission [2].This legislation is motivated by the well-established health equity research and care disparities that have been reported previously, such as racial differences in prenatal care [3] or worsened outcomes based on linguistic disparities [4].These outcome disparities are subject to regular reporting [5] and causes have been broken down into the factors of epigenetic racial differences, cultural, preference, implicit provider biases, and structural inequalities [6].
Linking reimbursement to health equity requires a means to measure the health equity of delivered care.While individual studies exist to measure health equity for care delivery of specific diagnoses and treatments, there is a lack of tools available to measure inequities in delivering healthcare.Furthermore, it is unclear what causes health inequity.Studies on health equity offer mixed insights.While some find no notable treatment disparities after accounting for confounders [7][8][9][10][11], others detect differences in treatment without adverse outcomes for minorities [12].However, a few link treatment disparities directly to negative outcomes such as delayed ICU transfers [13] and lesser preventive care [14], or lesser subspecialty care [15].
One major limitation of the current literature is that only single steps of a care pathway are investigated, such as ventilation or antibiotics treatment.This demonstrates the central limitation that while confounders outside of the care process are accounted for, e.g., socioeconomic status, age, sex, comorbidities, illness severity, etc., the confounders within the care process are often left out, e.g., previous and subsequent treatment steps, their length, and time delay [8].Using more granular clinical data may, therefore, allow for adjustment of these important factors in assessing inequities in care provision.
Process mining is a computer science discipline analyzing the data of end-to-end process executions [16].It builds on event logs, which collect all executed actions during the instantiation of a process, including the predecessor and successor steps of a single treatment.Currently, process mining has been used to optimize operational processes within hospitals [17,18] and to learn models of patient treatment pathways of patients [19][20][21].Process mining has not been utilized to investigate biases and inequities in delivering care.Using the event log, single-treatment step inequities, inequities arising for multiple, dependent treatment steps and inequities on the global level of treatment allocation can be assessed using process mining.In this paper, we introduce a process mining approach to construct a general framework for measuring health inequities.This method not only highlights single-treatment step inequities but also captures disparities in multiple interdependent treatment steps and the broader treatment allocation strategy and can be used as a general pipeline to investigate inequities in care delivery.Sepsis management is a treatment process that is characterized by its complexity and dependency on timely responses.Therefore, we apply this framework to assess the inequities in care for patients with sepsis in the ICU attributed to sex and English language proficiency and demonstrate the applicability of our framework.

Method
The proposed framework is an end-to-end solution that can be applied to an Electronic Health System (EHS) and provides various health inequity measures as an output.There are two major steps in the framework.First, the data is extracted into an event log that contains treatment information, and each treatment is referred to as a treatment step.Next, we split the event log patient based on different markers of health inequity and investigate for disparities with respect to single treatment steps, multiple treatment steps, or the global distribution of treatments.The workflow of the framework is shown in Fig 1.

Data preparation
The data is prepared by homogenizing patients by diagnosis and comorbidities, then identifying treatments, and creating an event log.
Homogenizing patients.First, the primary patient diagnosis is identified and patients not conforming to this diagnosis are filtered out.We conduct a two-step procedure to control for comorbidities that can be applied based on the availability and size of the data set: First, we identify specific comorbidities correlated with high negative outcomes, and, second, we split patient groups according to similar comorbidities with high negative outcomes.
Identifying treatments.Treatment identification involves linking entries in the database of the EHS to the occurrence of a treatment and its associated data.This step is conducted by domain experts who can identify relevant treatments and their attributes for the corresponding diagnosis.Additional attributes are extracted.The conducted treatments are associated with patients through the database information.
Creating the event log.The resulting data are extracted into a table where each row describes an event, containing a patient identifier and the administered treatment, including start and end time, the amount administered, and machine parameters.Additionally, the event log contains a patient data table with the personal attributes for each patient.We group all events that are associated with the same patient into a so-called case.The case provides the full set of treatment steps associated with a single patient.We extract the outcomes for each patient and associated them with the specific case.In the following, we introduce variants which are the abstraction of a case to its order of treatments.
Treatment variants.The specific order of starting and ending different treatment steps is called a treatment variant.The treatment steps of each patient are ordered by time of occurrence to retrieve the treatment variant of each case.Two cases belong to the same treatment variant if the order in which their treatments started and ended is the same.Subsequently, the data is prepared for bias analysis by splitting the variants into multiple groups according to a chosen marker of health inequity, e.g., ethnicity, language, age, or sex.

Bias analysis
The core part of our framework is bias and disparity analysis.We subdivide this into three different types of biases that can be analyzed: single-treatment step biases, multiple-treatment steps biases, and global treatment biases.Each of these three categories contains several specific biases.Our framework is extensible to new or custom disparity analysis that local data experts wish to measure.
Single-treatment step biases.Single-treatment step biases focus on the manifestation of a single part of the overall treatment for all patients that follow the same treatment variant.The aggregated values of a characteristic of the considered single treatment step are compared.This can be any characteristic of the treatment, such as the amount of administered antibiotics.In our baseline version of the framework, we include the average time for the considered treatment step.Furthermore, we compute the average SOFA score at the start of the treatment as a control variable.
Multiple-treatment steps biases.Multiple-treatment step biases are differences between patient groups in the relationship between multiple treatment steps within the same treatment variant.The aggregated statistics for a collection of treatment steps are calculated.An example would be the time between to treatment steps, e.g., the time between admission and first administration of antibiotics.In our baseline version of the framework, we calculate the average time between subsequent treatments.
Global treatment biases.Global treatment biases are differences between patient groups in the global allocation of treatment sequences.Techniques that can provide a comparison between distributions of patients over variants are applied to the variants.If the variant distribution is not the same, there are differences in either the administration of certain treatments or their ordering between patient groups, e.g., lower likelihood of ventilation.In our baseline framework, we use Pearson's chi-square test [22] with the null hypothesis of variant distributions being equal between two patient groups.We conduct a pairwise comparison between patient groups.
Uncertainty estimation.There needs to be an uncertainty estimation of the generated statistical measurements.In our baseline version of the framework, these are generated by incorporating bootstrapping.Through sampling and recalculating statistics, we provide confidence intervals that provide a sense of the reliability of the individual measure.The comparison of distributions is supported through hypothesis testing, providing significance levels for the difference in distributions between patient groups.In the baseline version of the framework, determine the 90% confidence intervals of all metrics by bootstrapping through sampling.These metrics include times, attributes, and outcomes.We depict confidence intervals using the mean and the allowed deviations to stay within the 90% confidence interval.

Cohort development
The Medical Information Mart for Intensive Care IV (MIMIC-IV) [23], a single-center database containing de-identified structured and free-text electronic health record (EHR) data from patients admitted to Beth Israel Deaconess Medical Center, 2008-2019 was used.A cohort of all adult (�18 years age) patients admitted to an ICU with a primary diagnosis of sepsis and with one of the three comorbidities: congestive heart failure, myocardial infarct, or chronic pulmonary disease, were selected.Demographics including age, sex, self-reported race, language proficiency (English-speaking or non-English speaking), comorbidities, as well as treatment events (see below), the start time of each treatment event, the patient's Sequential Organ Failure Assessment (SOFA) score at the start time of the event, and the end time of the event were extracted for each patient.All code is publicly available on GitHub (https://github.com/niklasadams/CareDisparities).This study is reported in accordance with the STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) statement (cf.S1 Fig) .MIMIC-IV has been previously approved for research use by the Beth Israel Deaconess Medical Center Institutional Review Board.All individuals involved in this study completed the required data and specimens human subjects research training and signed the requisite data use agreement.
For the event log, we extract treatment events that correspond to mechanical ventilation, the administration of vasopressors, and renal replacement therapy.The SOFA scores at each event are the 24-hour SOFA scores at the start of the event.If there is no available SOFA value, we use the next available.
We determine the treatment variant for each patient and filter out infrequent variants of less than 50 patients as these lack statistical support for further analysis.We received 24 variants.We depict the top 5 variants with aggregated attributes for sex and language in S4 Fig through S13 Fig.On the final variant set, perform two splits according to the sensitive attributes of sex and language.

Bias analysis
Patient sex.The descriptive statistics for the age, English proficiency, and ethnicity of female and male patients are depicted in Table 1.There was no significant difference in the distribution of treatment variants by sex (cf.S2 Fig) .With a p-value of 0.035, we cannot reject the null hypothesis of equal distributions at a significance level of 1% and 2.5%.
For multiple-and single-treatment step analysis, we investigate the third most frequent treatment variant depicted in Fig 2 with large statistical support of 1538 patients.This treatment variant describes the following treatment sequence: After ICU admission, the patients are ventilated.After the start of ventilation, the patients receive vasopressor.The administration of vasopressor is stopped at some point in time, afterward, the ventilation is also stopped.At last, the patient is discharged.The SOFA score of both male and female patients is in a similar range upon admission, which also holds for the time until the ventilation is started.For the remaining treatments, we observe similar times of treatments and times between treatments.There are only minor trends of female treatment times being longer while the times between treatments are shorter.This holds true even though the average SOFA score of female patients is slightly below that of male patients.The outcomes also do not contain large disparities between male and female patients.Overall, we cannot observe any significant sex-based disparity in this treatment variant.
Language.The descriptive statistics for the age, sex, and ethnicity of English and non-English speaking patients are depicted in Table 2.There was a significant difference in the distribution of treatment variants by sex (cf.S3 Fig) .With a p-value of 0.00025, we can reject the null hypothesis at any typical significance level.
The analyzed variant with metrics split based on patient language is depicted in Fig 3 .The SOFA scores at admission are in a similar range.However, the time to ventilation is higher for non-ESP patients.After that, the remaining treatment times and times between treatments all cover similar ranges, with non-ESP confidence intervals being slightly wider.While the other treatment steps and times between treatments do not show significant differences, the time to ventilation stands out as taking longer for non-ESP patients.We depict the same variant split for patient ethnicity in S16 Fig to compare for confounding effects.The differences in the time to ventilation are not associated to large disparities for the outcomes of mortality and length of hospital stay.

Discussion
We provided two main contributions in this paper: the proposed process mining framework and its application to sepsis patients uncovering inequities.First, we discuss conditions for the general application of the framework and how our framework subsumes existing care disparities research.Second, we contextualize the findings from applying our framework to MIMIC with existing literature.
In the broader scope of health inequities, there are multiple sources of disparity that have been investigated and confirmed.While our framework can be used to systematically analyze inequities stemming from the care process, the framework cannot be used to assess other sources of inequity, e.g., structural inequities.The most central limitation for the practical application of our framework is the data requirements.We split the patients according to their treatment variant.Naturally, this increases the data requirements compared to a traditional approach considering all patients together.To achieve a certain statistical robustness, there needs to be a minimum number of  patients within a treatment variant.When controlling for confounders and comorbidities, the data must be split.This further increases the necessary sample sizes per variable.Therefore, our more granular analysis comes at the cost of larger data requirements than traditional approaches.However, the framework is designed for use with large-scale electronic health records which helps in meeting the necessary data requirements.Using the framework on a national or global scale would require standardized data integration, which would help achieve the necessary data requirements.Such standardized data integration would also help low and middle-income countries apply the analysis by avoiding the technical components and, thus, reducing required resources.
Our framework considers bias analysis on three different levels: single treatment steps, multiple treatment steps, and global treatment.Individual studies have been proposed to uncover specific biases in each of these three levels.In the following paragraphs, we discuss how these studies fit into our framework and, therefore, how our framework subsumes existing research.
The following are specific biases described in the literature and can be calculated as singletreatment step biases.First, it has been investigated whether the administration of drugs such as antibiotics adheres to guidelines [8,12].Such guideline adherence is checked by comparing the attributes of a single treatment step, e.g., the administered dose or the type of antibiotics, to the ones specified in the guidelines.Second, some treatment steps are associated with better outcomes if conducted by a trained specialist, which is called subspecialty care [15].The corresponding person who conducted a treatment step would be recorded as an attribute, making it possible to assess the competency of the healthcare provider.Third, the administered amount of drugs, e.g., antibiotics, was used to explain differences in outcomes between races [24].
The most important multiple-treatment step bias that has been investigated are time differences between treatment steps, especially the time to antibiotics [8,9,12] or the time to resuscitation [25,26].This describes the time differences between two (or potentially more) treatment steps, e.g., between being admitted to the ICU and receiving the first dose of antibiotics.Another area of research is assessing subspecialty care across multiple treatment steps [15].
So far, global treatment biases with respect to the treatment process have not been explored.We present a collection of standard process mining algorithms and what biases they can uncover.By comparing the distribution of treatment variants one can uncover whether there are certain variants that are less pronounced in certain patient groups, e.g., where a treatment step is absent.This can be used to analyze the frequency of different treatment steps, such as ICU transfer, referral to treatments, or specific treatment steps [13,27,28], receiving preventive care [14] or emergency department admission [29] or discharge events after cardiac arrest [30].This global view can also be used to explore patterns and global trends between patient groups' treatment.Techniques from local process mining [31] or pattern mining [32] could point to frequent sub-treatment patterns; while techniques from process discovery could uncover differences in the global models that describe possible treatment pathways between patient groups [20].Concept drift detection methods could investigate whether processes of different patient groups developed in the same way [33].
In our application of the framework, we found that there is a longer time to the first treatment, i.e., ventilation, for non-ESP patients while there is no significant difference for male/ female patients.The times to treatments have been studied as one source of disparities [8,9,12].Our findings confirm that such disparities are present in non-ESP patients.These might be explained by the language barrier which would necessitate further language capabilities in the healthcare system.
There are several limitations in our application of process mining for health disparity evaluation for sepsis care in MIMIC-IV ICU patients.First, as language is confounded with race/ ethnicity we cannot exclude a racial influence on the time to ventilation.Specifically, Hispanic patients are 21% of the non-English speaking patients, compared to only 1% of English-speaking patients.Therefore, the root cause for longer times to ventilation for non-English speaking patients could also be due to a large share of patients being Hispanic or a confounding factor between both.The same holds for age.Second, since we had less data for non-ESP patients.This leads to wider confidence intervals as the variance of the estimated distribution mean is dependent on the sample sizes.Third, we grouped several similar highly influential comorbidities to control for them.Picking them out individually might increase the reliability of results, however, it also requires more data.Fourth, the reliance on electronic health records might introduce selection bias in itself, as certain important information might not be recorded within the information system.A general limitation of this study is the reliance on a singlecenter database.While our application on MIMIC functions as a proof of concept, future application of our framework to different databases should demonstrate the generalizability of this framework.All in all, these limitations highlight the necessity for large amounts of data to retrieve reliable results.Large studies across multiple hospitals are required to answer the questions of disparities.
Applying our framework requires the execution of several steps.First of all, we extracted the event log from the electronic health records.On the one hand, this requires knowledge of data extraction and transformation.On the other hand, this also requires medical knowledge to identify relevant treatment steps.Therefore, event log extraction necessitates an interdisciplinary collaboration between data scientists and medical staff.Later steps of the framework require further collaboration to provide a medically and statistically sound interpretation of the results.
We introduced a framework to assess biases and disparities in treatment processes using process mining in this paper.The framework builds upon two main steps: the extraction of an event log, providing sequences of treatments for patients, and treatment variant analysis, comparing the global allocation of treatment sequences as well as the differences between groups within the same treatment variant.The framework subsumes existing approaches to uncover care disparities.It provides a general resource that helps researchers answer questions about inequities in care delivery.To demonstrate this, we apply our framework to ICU patients in MIMIC-IV diagnosed with sepsis.We found that non-English-speaking patients had longer times to the first treatment, i.e., ventilation, while there were no significant sex differences in the time to first treatment.Our framework can become part of the systematic efforts to tackle health inequities and present an additional toolkit to implement monitoring measures required by the US government at the individual and global levels.Future work can integrate other aspects of process mining into a comparative framework, such as conformance checking for guideline adherence.

Fig 1 .
Fig 1. Framework overview.Overview of the framework to assess bias in treatments.The treatment variants along with the outcomes of all patients are extracted from the event log of the EHS.Subsequently, the variants are split according to a sensitive attribute, and we conduct bias analysis for single treatments, multiple treatments, and the global allocation of treatments.https://doi.org/10.1371/journal.pdig.0000575.g001