A simulation model for predicting hospital occupancy for Covid-19 using archetype analysis

COVID-19 pandemic has sent millions of people to hospitals worldwide, exhausting on many occasions the capacity of healthcare systems to provide care patients required to survive. Although several epidemiological research works have contributed a variety of models and approaches to anticipate the pandemic spread, very few have tried to translate the output of these models into hospital service requirements, particularly in terms of bed occupancy, a key question for hospital managers. This paper proposes a tool for predicting the current and future occupancy associated with COVID-19 patients of a hospital to help managers make informed decisions to maximize the availability of hospitalization and intensive care unit (ICU) beds and ensure adequate access to services for confirmed COVID-19 patients. The proposed tool uses a discrete event simulation approach that uses archetypes (i.e., empirical models of trajectories) extracted from empirical analysis of actual patient trajectories. Archetypes can be fitted to trajectories observed in different regions or to the particularities of current and forthcoming variants using a rather small amount of data. Numerical experiments on realistic instances demonstrate the accuracy of the tool’s predictions and illustrate how it can support managers in their daily decisions concerning the system’s capacity and ensure patients the access the resources they require.


Introduction
The COVID-19 pandemic, provoked by the coronavirus 2 (SARS-CoV-2), was first declared a Public Health Emergency of International Concern and later a global pandemic on 11 March 2020. The pandemic has resulted in significant social and economic impacts, and a large set of preventive measures, including social distancing, the broad use of face masks, monitoring and self-isolation of people exposed or symptomatic, and curfews, have been deployed worldwide.
Healthcare systems are particularly required during the successive COVID-19'waves. Although most people infected by COVID-19 develop mild or uncomplicated illness, it has been reported that, during the first and second waves, approximately 14% of infected individual developed severe forms of the disease, which require hospitalization and oxygen support and necessitate regular monitoring of vital signs that facilitate early recognition and escalation of a deteriorating [1]. Deterioration in adults is often sudden and may occur in the second week of illness. Critical symptoms such as pneumonia, Acute Respiratory Distress * Correspondence to: Pavillon Palasis-Prince, Université Laval, 2325 Rue de la Terrasse, Quebec City, QC, Canada G1V 0A6. Syndrome (ARDS), sepsis and septic shock, and multiorgan failure, including acute kidney injury and cardiac injury, can be developed. Admission to an intensive care unit (ICU) where mechanical ventilation and continuous monitoring is therefore mandatory to minimize life-threatening risks [2].
COVID-19 has demonstrated that, despite efforts of individuals and governments to mitigate its rapid progression, healthcare systems must prepare and plan adequate resources to provide the right support to severe and critical patients. Although huge efforts have been done seeking to forecast COVID-19 dissemination in population, considerably less research has been devoted to assessing the impact of COVID-19 on capacity management at the hospital level. The primary goal of this paper is therefore to support capacity management during a pandemic, instead of addressing the epidemic dynamics of COVID- 19. In the context of COVID-19 spread, this paper presents a discrete event simulation approach based on archetypes extracted from empirical analysis of patient trajectories to support decision-makers who need to manage the capacity of a hospital or a set of hospitals to ensure adequate access to confirmed COVID-19 patients. It is worth mentioning that this work has not been developed to fit the specific case of a particular hospital, although it inspires by our observations in two hospitals in the Valencian Community, Spain. Also, notice that the paper does not consider non-COVID-19 patients nor the services the hospital provide to them. It is reasonable to assume that, given its infectious nature, COVID-19 patients cannot or should not mix to other patients and therefore, planning independently the resources devoted to COVID-19 seems appropriated. Our goal is to provide a tool able to anticipate, over a given planning horizon, the current and future hospital's occupancy associated to COVID-19 patients. This should help managers making the right decisions to maximize the availability of critical resources such as hospitalization beds for severe COVID-19 patients and ICU beds and ventilators for critical patients. To this end, this paper proposes a data-driven approach to model clinical trajectories of COVID-19 patients in terms of resource usage and then simulates their sojourn at the hospital. Trajectories are inspired by a compartmental SIR (Susceptible-Infected-Recovered) based model, which has been adapted to account for particularities of COVID-19 clinical pathways. Numerical experiments assess the accuracy of the occupancy predicted by the proposed tool and illustrate how it can support managers in their daily decisions concerning the system's capacity and ensure patients the access to the resources they require.
The rest of the paper is structured as follows. The next section provides a brief overview of works that aim to estimate the healthcare resources required to cover the needs of a population affected by the COVID-19. Then, the proposed hybrid approach is presented followed by a description of experiments and results. Conclusions and further research avenues close the paper.

Literature review
Operations research and operations management researchers have been largely discussing the limited availability of healthcare resources during the last two decades, with special emphasis on resource allocation in various contexts such as primary and secondary care, outpatient scheduling, emergency department, operating room, and home care, among others [3][4][5][6]. In the context of the COVID-19 pandemic, the number of published scientific contributions focused on the biological and clinical aspects, but also on logistical and planning issues has increased very fast. Since our goal is to anticipate hospital occupancy, we will not review contributions or models aiming at predicting the spread of the disease or the patients' arrivals to hospitals, but papers proposing models and approaches to explicitly represent or capture the patients' flow through the hospital services.
To this end, we conducted a literature search from January 9, 2023, to March 1, 2023. The date of the last performed search was January 23, 2023. The search was performed on Web of Science (all the available databases). The first search required the following combination of keywords: [(covid) OR (corona)) AND (simulation)] in the title or in the abstract. The output of this search included a very large number of papers not related to our study, such as the ones dealing with molecular research, vaccination strategies and logistics, on site simulation as a tool for professional training, or even the use of simulation for education in medical cursus during the pandemics. We easily removed most of them by filtering out from the original query the papers that contained the keywords: [(training) OR (in situ) OR (molecular) OR (education) OR (vaccin*)] in the title or in the abstract.
Inclusion Criteria. The first inclusion criterion was to ensure that the article described a quantitative model or tool for estimate or anticipate hospital occupancy or capacity during the COVID-19 pandemics. Second, we ensured that the model included the possibility to define a population served by a single, or a group of hospitals. The third inclusion criterion was that the article must have investigated usage/occupancy of at least two different resources within the hospital. Exclusion Criteria. We limited our search to English language articles. Second, we excluded models that focused on forecasting cases or the epidemic curve without hospital parameters. Third, we excluded models that did not concern hospital settings (for instance, outpatient clinics). Fourth, we inspired by an early systematic literature review [7] on models supporting hospitals' preparedness for COVID-19, to also exclude models that focused on hospital resources without consideration of COVID-19 caseloads or hospital length of stay.

Results.
Our final query produced a dataset with 5901 papers. We screened titles and abstracts, and we selected 65 papers that met at least one inclusion criterion. We then read full-text articles of the remaining 65 articles. After considering all inclusion and exclusion criteria, 31 articles were considered for a detailed review. The other articles did not meet the inclusion criteria or the exclusion criteria because (i) 6 articles focused on surgical backlogs provoked by COVID-19, (ii) 11 articles studied approaches to allocate scarce resources (ventilators) or predict risk of intensive cares for arriving patients, and (iii) 17 papers reported healthcare contexts concerning outpatient or clinics.
Analysis. model patients flow through the hospital. Our review of related papers allowed us to identify four main approaches to do so: i) statistical or time series models that forecast service usage or hospital occupancy as a function of the observed hospitalizations or data, ii) multistate models that represent the patients' trajectories between services in the hospital, iii) agent-based simulations, and (iv) individual or discrete events simulations.
Statistical or time series models. Most of early works fall into the first category. Wells et al. [8] is one of the pioneers works trying to address hospital resource planning to cope with the COVID-19 surge. They use evaluations of COVID-19 hospitalizations and data on the proportion of patients with COVID-19 in the ICU requiring ventilation to draw a rough estimate of the number of ventilators at outbreak peak in the USA. Area et al. [9], Rainisch et al. [10], Weissman et al. [11], McCabe et al. [12] and Warde et al. [13], first use epidemiological models (variants of SIR models) to anticipate the number of infected/arrivals to the hospital, which are then translated into intensive care beds or ventilators. For instance, Area et al. [9], Weissman et al. [11], and Rainisch et al. [10] collected data published in the literature on healthcare usage and outcomes among infected individuals and apply them to their predictions of the pandemic's local spread to estimate hospitalizations, admissions to intensive care units, and causalities. Area et al. [9] used their model to help public health officials estimating the future impact of the COVID-19 outbreak on demand for healthcare resources and to examine the costs and benefits of various intervention strategies. Weissman et al. [11] also used published data reporting on the clinical course of observed patients early in the epidemic to set their model parameters (e.g., the proportion of infections requiring hospitalization, the proportion of hospitalizations admitted to intensive care, the proportion of patients in intensive care requiring ventilation, and the hospital and intensive care length of stay). Finally, they executed a MonteCarlo simulation by sampling the values of these parameters to compute expected resources occupancy as well as confidence intervals. Warde et al. [13] used a system dynamics approach to simulate the dynamics of the disease and estimate the patients' arrivals to hospitals, from which they assumed that 19.5% of them required hospitalization. They modeled hospital length of stay dynamically using the data of patients treated so far to assess hospital occupancy.
Kozyreff [14] propose an analytical formula to estimate the number of beds occupied by COVID-19 patients from infections forecasts. Their analytical curve was fitted to data in Belgium, France, New York City and Switzerland. The fitting was used to extract estimates of the mean recovery time and the mean hospitalization time, although large variations were observed among different outbreaks. McCabe et al. [12] includes a 'dual-demand' model of care requirements incorporating demand from COVID-19 and non-COVID-19 cohorts. The former is projected under different epidemiological scenarios whereas the latter is estimated using average annual occupancy figures. The requirements of each resource are calculated per patient, with multiple data sources used to parameterize the model.
Berta et al. [15] proposes a Vector AutoRegressive (VAR) model to forecast the daily counts of hospitalized patients with symptoms and of patients in intensive care, using publicly available data to produce one-week-ahead forecasts. Panaggio et al. [16] propose a timeseries forecasting method called SARIMA (Seasonal Auto-Regressive Integrated Moving Average Model) to forecast the total number of confirmed COVID-19 hospital admissions as well as staffed inpatient and intensive care unit beds used by COVID-19 patients. Zhang et al. [17,18] created a machine learning method that can use information (i.e., resource utilization, pandemic progress, population mobility, weather condition, and public policy) currently known about a region to forecast hospital resource utilization (i.e., the number of hospital beds, intensive care beds, and ventilators) several weeks in advance. Finally, Putra et al. [19] focus on hospitalization needs for delivering women. To this end, they collected the incidence of COVID-19 in pregnancy for each age group and use lowest and highest data found as the lower and upper bound of uniform distributions to model a probabilistic model for each group. Then, as in Weissman et al. [11], a MonteCarlo simulation approach was used to draw the specific proportions of woman using hospitalization resources during delivering.
Multistate or compartmental models. Multistate models aim to capture (i) the evolution of the hospitalized patients' symptoms (severe or critical) and therefore the cares they require (ward, intensive care, ventilators), or (ii) the flow of patients between units based on empirical observation of patients' pathways. A network is built where nodes correspond to services or units at the hospital in the former case, or patient's states in the former, and probabilities of transitions between the nodes are then fitted from local data or inspired by data in the literature.
Basically, SIR-based models to capture the needs of patients arriving to hospitals split patients into general hospitalization, patients requiring intensive care, deceased, and recovered. Arslan et al. [20] use the actual number of deaths in Turkey to estimate the number of people infected in the community, and then estimate the expected total numbers of deaths and hospitalizations, which is feed to their hospital SIR-based multistage model called TURKSAS. Parameters of TURKSAS (i.e., probabilities of transition between patients' states) are assumed constant. Shin et al. [21], and Luensmann et al. [22] also proposed SIR-based models, but they consider the parameters of their multistate models as dynamic, and they both fitted them using the least squared error method to moving time-window data (8 days in the case of [22]; and two weeks in [21]). Implicitly, this assumes that the trend for these parameters of the dynamics and people requiring attendance at hospitals remains unchanged during these horizons.
Several multistate models based on the observed patients' pathways have also been proposed. Lam et al. [23] describes the development of a dynamic simulation framework to support agile resource planning during the COVID-19 pandemic in Singapore. They propose a set of differential equations representing the flows of patients between the different medical resources (external dormitories, isolation, wards, intensive care) with estimated length of stay from historical data. Their model is therefore a deterministic one, although parameters (e.g., length of stay at intensive care) can be changed dynamically. They performed sensitivity analysis for various ranges of intensive care and other services' length of stay and tested the expected demand coverage for three patients' arrival scenarios. Chertok et al. [24] uses a SIR model to estimate the arrival of patients to the hospital and then a multistate model to project short-term hospitalizations, intensive care and vent placements and deaths. They consider transitions between three locations: ward, in the intensive care but not intubated (elevated acuity), intubated (highest acuity). Data from NorthShore University HealthSystem, Chicago, Illinois, USA, to set the parameters of their models and probabilities of transition.
Parameters of multistage models can also be represented by probabilistic distributions to better capture their variability. Heins et al. [25] collected daily information on the real number of infected people and applied a polynomial regression to predict further arrivals to hospitals. Then, assuming that the length of stay at different locations follow triangular distributions with parameters (minimum, mode, maximum), they executed a MonteCarlo simulation to forecast the short-term bed occupancy of patients with confirmed and suspected COVID-19 in intensive care units and regular wards.
Deschepper et al. [26] propose a planning tool combining a Poisson model for the number of newly admitted patients on each day to a multistate model to predict the flow of patients in the hospital. All possible transitions between the wards Non-COVID-19, Covid, intensive care Midcare, intensive care Standard, and intensive care Ventilated are considered. The model assumes the Markovien property which implies that the previous pathway of a patient in the hospital should not influence its next transitions.
Agent-based simulation (ABS) models. ABS models have also been used for estimating demand for hospital beds during the COVID-19 pandemic. Preiss et al. [27] and Hadley et al. [28] describe an ABM built to forecast the total demand for intensive care and ward beds in North Carolina, USA. They first assigned the same probability of hospitalization for all agents with COVID-19 and further consider the age and comorbidity status of agents in the synthetic population to improve the performance.
Discrete event simulation (DES) models. Contrarily to multistate models that concerns flows, DES models allow to represent explicitly the pathway of each individual patient through the hospital. Wood et al. [29] propose a multisever queueing model for evaluating scenarios to mitigate intensive care backlog for COVID-19 patients. Their model (M(t)/G/C/C) assumes a time-dependent markovian arrival process to the UCI, and a general distribution to model the patient's length of stay (the same for all the patients). Since the number of servers equals capacity, patients are rejected if there is no available server (bed) upon their arrival. The model is implemented as a DES. Although this model concerns only intensive care unit, other models describe more complex settings including simultaneously ward and intensive care beds. In the models proposed by Caro et al. [30] and Garcia-Vicuna et al. [31], patients can be admitted either to the intensive care or first to a hospital ward with potential transfer to intensive care if their condition deteriorates. Discharge from a ward can follow either death or a health improvement. Patient transfers from the intensive care unit to a hospital ward occur after a health improvement. Length of stay at each state (unit) is modeled by probability distribution functions that are sampled in a MonteCarlo simulation to estimate hospital occupancy. Caro et al. [30] choose the Weibull distribution to model length of stays because it allows for the longer tail expected by the hospital and encompasses the simpler exponential distribution as a special. The model recorded, for each patient, the dates of all changes in location of care that were accumulated to yield the numbers of patients in each location on any given day, and these were compared with the capacity limits for each location as set by the user to determine whether they were exceeded. In other words, the model did not explicitly handle hospital's capacity as a constraint. According to the authors, this approach was implemented because the hospital team believed that all hospitals have means to expand their capacity and implementing queues was less useful and not realistic. Like the previous works, Tavakoli et al. [32] also use DES to, given expected daily arrivals to the hospital, predict patients transitions to and from ward and intensive care unit to discharge or death. Their model was parametrized to the data of a hospital in Iran, and they proposed a DEA (Data Envelopment Analysis) approach to compare the expected performance of 27 potential scenarios where the available resources (number of beds and workforce at each considered service unit) varied. Shahverdi et al. [33] propose a DES model of a generic 200-bed urban U.S. tertiary hospital to investigate how hospital functionality may be affected by the care of COVID-19 pandemic patients along specially designated care paths, under changing pandemic situations. Melman et al. [34] propose also a DES model but, contrarily to the previous models, they handle three distinct patients flows: COVID-19, elective surgery, and emergency surgery patients. Indeed, surgery and severe covid patients share intensive care beds and resources so authors propose strategies to adapt the surgical plan according to the foreseen available intensive care beds. Covid patients' inflows were modeled by a state network. The parameters of their DES (mainly length of stay at each location of the model) were fitted from historical data of 475 COVID-19 patients and 28,831 non-COVID-19 patients in Addenbrooke's hospital in the UK.
Finally, we deemed worth describing the contributions of Sanchez-Taltavull et al. [35] and Mascha et al. [36] that, although do not focus on bed or ventilators capacity, address the problem of the potential reduction on activities/capacity provoked by the infection of workers. Mascha et al. [36] defines two potential scenarios of in-hospital infection, including a COVID-19 infection probability of 10%, 25%, or 40% per week, over a period of 120 days. Their simulations show workforce savings due to rotating staff each week for each infection probability, and most noticeably in the first 6-10 weeks. Sanchez-Taltavull et al. [35] proposes an extension of SIR model (SLIRSW) to predict the impact of COVID on health workers. Workers can be healthy and susceptible to infection, S, infected in the incubation period, L, infected presenting symptoms, I, and after recovery temporarily immune to new infections, R. Eventually the recovered workers lose their immunity and become susceptible again. They use these models to analyze strategies to improve workforce productivity and the efficiency.
Despite the precious contributions of these works, most of them seek to estimate services demand during the COVID-19 pandemic, and only few of them go further to, for instance, inform the possible saturation of hospital capacity. Contributions linking predictions of needs to models or strategies to manage resources are scarce. For instance, Berger [37], Wood et al. [29] and Wood et al. [38] are to the best of our knowledge, among the few papers discussing explicitly admission rules based on the expected hospital occupancy. Looking forward for next, possible pandemics, Fattahi et al. [39] addresses various Resource planning strategies for healthcare systems facing pandemics. In particular, they propose a multi-stage stochastic program (MSSP) for the integrated healthcare resources planning and demand redistribution during a pandemic. The demand for the model is created by an agentbased continuous-time stochastic model for modeling the COVID-19 transmission and then a scenario tree construction approach to capture the stochasticity of the number of infected individuals requiring hospitalization.
Our research is based on the use of empirical patients' trajectories, referred to as archetypes, that inspired by observed patients' pathways through the hospital. The subtle yet important feature of our model is that, unlike previous models, it separates patients by specific trajectories and, for each stage or unit visited in the trajectory, it uses probabilistic distributions to model patient's LOS. This is not the case in multistate models where the transition probabilities concern all patients for all states, nor in the DES models, where the course of each patient is ''built'' (see for instance [30]) as they progress through the hospital. We believe that the proposed approach has several advantages. With respect to multistage models, this allows patients to be separated into homogeneous categories to capture specific characteristics. In the case of COVID-19, it is thus possible to separate patients by sex or age to distinguish their respective LOS. This separation of patients can be easily implemented in DES models. However, with respect to them, training the trajectories of the patients outside the simulation makes it possible to reduce the variability of the results. Our research also sheds light on the performance of rules to manage admissions based on the expected available capacity, and the optimal ratio between the number of intensive care beds to ward beds.

A simulation model based on patients' covid-19 trajectories archetypes
As it was mentioned before, the focus of this paper is to study hospital capacity management. We assume that the epidemic dynamics of COVID-19, or in other words, the arrivals of patients to the hospital is an input to our tool. Therefore, the tool is solely concerned with patients who develop severe to critical conditions requiring hospitalization. We assume that all patients having developed severe/critical symptoms contact the healthcare system and are directed to a hospital. Although this hypothesis has drawbacks, mostly concerning vulnerable and isolated populations, we believe that it may be acceptable in the COVID-19 context, especially when, after several waves, the healthcare system and the population are aware of the disease's threats. In addition, we assume that once a severe COVID-19 patient is hospitalized, transfer to another hospital is to be avoided. Patients may be diverted to other close hospitals having available capacity, if it is the case, at their arrival at the hospital.
Our tool is based on (i) the study of clinical pathways or patient trajectories as an aggregated representation of needs and interactions between patients and the healthcare system, and (ii) a discrete event simulation model (DES) able to capture the dynamics of the patient trajectories in time. These aspects are discussed in the next subsections.

Modeling COVID-19 patients' trajectories
We draw inspiration from the SIR (susceptible-infected-recovered) compartmental model to map the sequence of potential ''states'' followed by COVID-19 patients during their sojourn at a hospital. As it can be seen in Fig. 1, we propose a compartmental model that includes four states (infected-severe-critical-recovered) and does not consider mild or asymptomatic cases. The leftmost part of Fig. 1 presents the progression of COVID-19 if no care is provided. After several days of incubation with mild or no symptoms, infected individuals (I) may recover (R) or develop severe symptoms (S). A few days later, they can recover or develop critical symptoms (C). If no life-support is provided, a large part of them will die (D), although a fraction of them will recover. The compartmental model shown in Fig. 1 contains two other columns that represent the moment at which care begins in the patient trajectory, allowing to differentiate, for instance, three types of critical state: C for a critical patient who has received no care, C S for a critical patient who was hospitalized during the severe phase, and C C for a critical patient who has received cares only once critical symptoms appeared. To link patient clinical states to observable variables, cares, and resources, we refer to The Guidelines in Clinical management of patients with moderate to severe COVID-19 [40]. Guidelines propose specific treatments, and thus medical resources, for severe and critical symptoms. By linking patient states (severe and critical) to the resources they require (hospitalization and intensive care beds, respectively), the compartmental model produces the structure for patient pathways, referred to as archetypes, which constitute the base for the simulation model. Broadly speaking, these archetypes are characterized by a sequence of cares that correspond to patients' states, and a length of stay (in days) at each stage. Also, to each archetype is assigned a probability of occurrence (average chance that an arrival patient follows this path) that is fitted to historical data as will be discussed later. Fig. 2 presents a set of potential archetypes based on the compartmental model. Each archetype is associated a probability, which corresponds to the prevalence of the given archetype (given by column %), and a length of stay (in days) at each stage of the patient's sojourn at the hospital. In the example presented in Fig. 2, severe patients (archetypes #1 to #3) are admitted and hospitalized. Most of them (69% in the example) follow archetype #1: after being admitted they spend five to seven days at the hospital, then they are discharged and recover. However, 20% of patients with severe symptoms (archetypes #2 and #3) evolve to critical after three to five days of hospitalization, requiring ventilation or life-support at the ICU. Indeed, patients following archetype #3 will move to the ICU and, after seven to 10 days, they will be moved back to a hospitalization bed, where they will stay five to seven additional days before being discharged.
Each archetype translates patient states (symptoms) into resource requirements, which in turn are used by the DES to estimate hospital's occupancy.

Empirical fitting of archetypes and their parameters
We assume that a data sample containing full trajectories of a set of patients is available. The sample should allow to identify the sequence of units visited by each patient and the time spent at each unit. At this point, no other information is necessary to identify (i.e. define the sequence of units) and parameterize (i.e. set the length of stay) the archetypes.
Archetypes can be identified by clustering technics but, in the particular case of COVID-19, they can be explicitly enumerated as it was illustrated in the previous subsection. The prevalence of each archetype is estimated as its frequency in the sample. For each archetype, data in the sample is used to select the probabilistic distribution (and its parameters) modeling the length of stay at each visited unit. Suggested candidate distributions were Weibull, Normal, LogNormal and Gamma, although other distributions might have been considered. Finally, it is possible to assign to each arriving patient a simulated trajectory by sampling first the type of archetypes and, once the archetype is selected, draw the patients' length of stay at each stage or service from the corresponding distributions.

Emulating the resources' allocation and the patients' trajectories in a hospital
A Discrete Event Simulation [41,42] is used to emulate both the resource allocation process and the movement of patients within the hospital. A DES, as defined in Kelton and Law [41], deals with the modeling of a system as it evolves by creating a representation in which the state variables change instantaneously at particular points in time. These points in time are the ones at which an event occurs, where an event is defined as an instantaneous occurrence that may change the system's state. The event management is provided by a simulation engine, a timing routine which moves the simulation clock from one event time to the next. In the proposed simulator, length of stay observations have been discretized on an hourly basis, so the simulation clock moves in steps of hours.
Patients are modeled as entities whilst hospitalization beds, intensive care beds and special equipment are modeled as resources, that can or not be limited in number by the user. Resources are assumed to be homogeneous and renewables and are allocated based on a firstarrived-first-served rule. A patient entity can be in one of the following states: rejected or diverted at the arrival at the hospital, in treatment at a care unit i, ready for moving to the next unit, blocked at a care unit i, discharged, and deceased. States rejected, diverted, discharged, and deceased are final states for patients (there is no possible transition to other states). A blocked state indicates that although the patient's medical condition suggests the transfer to the next care unit, there is no available capacity so the patient must remain in the current care unit. Transitions between states are triggered by dynamic events. In our case, we consider three types of events (i) Admission of a patient, (ii) Departure of a patient to the next care unit, and (iii) Arrival at unit. The two first events are said bounded (B) because their execution date can be predicted (or simulated by sampling from a probability distribution) by the system, and the third event is said conditional (C) because its execution time depends on the system's state. Each time an event is executed, the system state (and eventually the entities' states) is modified according to specific rules, and the simulation clock moves to the next event on the list.
At the beginning of the simulation, the list of patients and their arrival times are read. These arrival times correspond to Admission events. Each patient is then randomly assigned to an archetype with a probability equal to each archetype's prevalence to obtain the sequence of care units that the patient will visit. The length of stay distributions associated to each care unit in the archetype are sampled to simulate the time at which the patient should leave a given unit and move to the next. These times correspond to events Departure to the next care unit. When a Departure event is executed, the patient's state is changed to ready to move. An event Arrival at unit care happens if a patient's state is ready and there is an available bed at its destination unit. If it is the case, the patient's state changes to treatment, and the number of available beds at the unit is reduced by one. If there is no available bed at the destination unit, the patient's state is changed to blocked. Therefore, Arrival events are also executed when a patient is blocked, and a bed is free at the patient destination unit.
The simulator, which was coded in the development environment of software ''R'' following the 3-phased approach proposed by Pidd [42], iterates between the execution of B events and C events, and a clock advancement routine allows to move to the next event time once all the events scheduled to the same date have been executed. The simulation stops when the simulation clock reaches a user defined final date or whenever all the events have been executed. We referred the interested reader to Pidd [42] to further details on the 3-phased approached for implementing discrete event simulations.

Numerical experiments
The purpose of this section is to (i) validate and assess the quality and accuracy of predictions provided by the proposed tool, (ii) to demonstrate how predictions translate into added value for managers who want to determine whether or a hospital will have enough capacity to provide required services to an incoming patient. To this end, we used the COVID-19 patients trajectories generator proposed by Marin-Garcia et al. [43]. Although the generator is not related to any specific hospital or region, patient trajectories (including types of care and length of stay at each stage of the trajectory) were fitted according to realistic data. The next subsection briefly describes how the arrival of patient as well as their trajectories are generated. Then, two sets of experiments will demonstrate the potential added value in terms of decision-making for managers.

The patient trajectories generator
To test the accuracy and relevance of the proposed tool, a dataset of realistic instances produced by the generator proposed in Marin-Garcia et al. [43] was used. Once adequately parametrized, the generator provides a list of patients' arrivals, and for each patient, a description in time of the care needed. More precisely, each dataset represents the demographics, comorbidities, dates of admission, and dates of transition from one unit to another according to the patient's symptoms or needs. The generator produces families of datasets (replications) that share the same generation parameters, but with changes to the randomization seed in each replication. The main parameters to run the algorithm include: the number of replications to be generated; the number of patients to be generated; the percentage of cases per day with respect to the total number of cases generated; the patients' age ranges and the percentage of patients desired in each age range; incidences of comorbidity and the percentage of patients who will have each of the considered incidences.
Realistic instances were generated by setting the generator's parameters to the real data associated with the needs of COVID-19 patients with severe or critical symptoms in the Valencian Community, Spain, a region with about 2.5 million inhabitants from September 2020 to February 2021, including therefore two full waves of the pandemic. Notice that only confirmed cases of severe or critically ill patients requiring hospitalization were considered. We refer the interested reader to Marin-Garcia et al. [43] for a thorough description of the generator and its validation, as well as information on how to get access to the realistic instances.
There are many reasons supporting the use of models able to generate realistic datasets rather than real ones, particularly in the field of management science. First, data collected in a given hospital are the result of the strategies and processes applied at the observed case (the hospital) and its environment (its healthcare system). The data dependency to the particularities of the case limits the extent to which studies based on ''real'' data can be generalized to other contexts and favors the use of realistic or plausible data that are free of interactions with local variables or decisions [43,44]. For example, although the main hospital in Valencia from which data inspires never experimented saturation, we are well aware that many hospitals around the world were not able to offer to their patients care they required on time. Using real data from those hospitals should have led to trajectories that would have been affected by the specific resource availability and prioritization strategies. Moreover, the use of models able to generate realistic datasets allow considering situations that have not been experimented. The parameters of the model can also be personalized to capture specific cases. Lastly, working on models that generate credible data (1) forces researchers to identify and define data structures that would guide further real data collection, and (2) speeds up research since their work can be validated, at least partially, without waiting for real data collection.
For the purpose of this research, 21 instances were produced by the patient's trajectories generator. Each instance spans 120 days, and for each day, it contains the new arrivals as well as the transfers of patients between care units, which allows to reconstruct the trajectory of each patient. These trajectories will be referred to and considered as the ''real'' trajectories in the following.

Empirical identification of archetypes
We first identified the archetypes and fitted their parameters as described in Section 3.2. To this end, we elected randomly one of the instances, which will be referred to as the training instance. By analyzing the pathway followed by each patient in the instance, we identified three archetypes. The first archetype corresponds to patients that after being admitted to a ward, were discharged few days later. 73.9% of the patients in the instance followed that pathway. We compiled the length of stay for each of these patients and then fitted the candidate distributions' parameters (Weibull, Normal, LogNormal and Gamma) using the maximum likelihood method. Probability plots were drawn for each candidate distribution and after inspection, we selected the Weibull distribution with parameters = 2.96 and = 14.51. The second archetype includes patients who were admitted to hospitalization but after few days required intensive care. They stayed at the intensive care unit for some time, and then they were discharged or deceased (there is no information in the data allowing to identify the outcome of the patient). 21.6% of the patients in the instance followed that pathway. The third archetype represent patients first admitted to hospitalization, moved to intensive care, and then send back to hospitalization. Only 4.5% of the patients in the instance followed this pathway. We applied the same parameters' estimation method, and we compare the probability plots to determine how representative the fitted distributions were. The results are shown in Table 1, which reports for each archetype its prevalence (column %) and the sequence of care units visited by patients in the archetype (Seq.), as well as the average total length of stay in the hospital (Av LoS). It also reports the probability distribution and parameters that best fit the respective length of stay at each care unit in the sequence (LoS 1 up to LoS 3 ) for each archetype.

Assessing the accuracy of the tool's predictions
To assess the accuracy of the occupancy predictions produced by the tool, we considered the remaining 20 instances in the testbed. From each instance, we extracted the patients' arrivals dates and we assigned to each patient simulated trajectories. To this end, we assigned to each arriving patient an archetype with probability equal to the archetypes' prevalence. The archetype defines the sequence of services the patient will visit, and the time to stay at each service is obtained by sampling the associated LOS distribution in Table 1. Then, the simulated trajectories were feed to the simulator and the occupancy of the care units were computed and compared to the real occupancy produced by the real trajectories provided in the instance. Preliminary experiments allowed us to set to 20 the number of replications (repetitions) required to produce reasonable confidence intervals at 95%.  Results are reported in Table 2. The first four columns give the real (R) and the average occupancy predicted by the tool (S) in terms of number of patients for the hospitalization (Hosp) and intensive care units (ICU), respectively, over the horizon (120 days). Then also reports 1→ , the average daily error between real and predicted occupation at a given unit (Hosp and ICU) from period 1 to period t, as well as a confidence interval at 95% of such estimator. Intervals of confidence for = {15, 45} were computed using the Student distribution while the Normal distribution was used for = {60, 120}. Table 2 shows that on average, predicted occupancies are very close to the real ones. If we look at the prediction error 1→ , errors are on average quite small, as they also are the half-width of their confidence intervals. Furthermore, it can be observed that long term predictions tend to underestimate the occupancy ( 1→ > 0), both for hospitalization and in ICU beds. A further analysis on the data allowed us to find that the prevalence of archetype 3 (the one describing trajectories H ; ICU ; H) in the training instance was lower than the average prevalence of archetype 3 over the whole testbed. Consequently, the tool tends to simulate less patients of archetype 3 (the ones with the longest total length of stay) which translates, as we go further in the prediction horizon, in a slight underestimation of the hospital occupancy. Overall, we consider that the proposed tool produces very accurate predictions and therefore it can be used to anticipate, given a pattern of patients' arrivals, the occupancy of resources.
Finally, to give a visual appraisal of the accuracy, Fig. 3 shows the average real and the predicted (simulated) occupancy of hospitalization and ICU over the 20 instances. Fig. 3 illustrates also how the hospital's occupancy grows to reach a maximum of 251 (real) hospitalization beds on day 81, while a maximum of 49 (real) ICU beds are occupied at day 83.

Using occupancy predictions to deciding hospitals' admissions
This set of experiments seeks to support managers decisions concerning the admission of patients at a given hospital. The dynamics of bed occupation are not trivial and very difficult to anticipate. In fact, it may happen that, looking at its current occupancy, a hospital would accept too many patients so that, a few days later, its ICU collapses and some patients will not have access to ventilators or other life support equipment, putting their survival at serious risk. Other hospitals might adopt a more conservative approach accepting fewer patients to ensure adequate capacity at the ICU, particularly if arriving patients can be referred to other surrounding hospitals having available capacity.
A few works have studied how to manage access to intensive care in the context of COVID-19. Frej et al. [45], Love et al. [46], Silva-Aravena and Morales [47], and Gray et al. [48]; have investigated the problem of allocating scarce resources (i.e. intensive care beds and ventilators) in the context of COVID-19. However, they all seek to select, among the patients requiring resources, those who have the highest probability of recovery, which is not really our objective. We, on the other hand, study how to anticipate the future availability of ward and intensive care beds.
Berger [37] proposed a compartmental model to predict the disease progression which has been adapted to account for the effects of social distancing policies. The model is linked to a feedback controller that keeps the number of infected with moderate to severe symptoms (which typically require hospitalization) below a threshold defined by the number of available intensive care beds. Wood et al. [29] considered 3 scenarios of patients' arrivals, and two other scenarios where the length of stay at intensive care was reduced by 25% and the number of beds increased, and evaluated their impact on the expected number of patients unable to get intensive care. In a sequel, Wood et al. [38] tested 3 different triage strategies to decide on the admission to intensive care. Although their second strategy is related to our study (patients are admitted provided there is at least a certain number of beds available at the point of demand), their two other strategies seek to maximize short-and long-term survival, and to this end they limit the admission to or the length of intensive care for patients whose age is over a given threshold. That said, our experiments seek to assess, by means of our simulation tool, the performance of the following three admission policies or admission rules: P1 -Blind policy. Every patient arriving to the hospital is admitted, provided that a ward bed is available (equivalent to the first-come-firstserved rule).
P2 -Threshold on the ICU's occupancy level. A patient is admitted if the level of occupancy at the ICU is lower that a given threshold , and rejected otherwise. This rule tries to avoid patients to be blocked at the ICU, which has revealed as the system's bottleneck in most hospitals. We tested several specific values for the policy's parameter = {80%, 85%, 90%, 95%}, to observe its impact on the policy's performance.
P3 -Archetypes' simulation. Each time a patient arrives to the hospital, a simulated trajectory is generated based on the archetypes identified in Section 4.1. We use this trajectory in addition to the simulated trajectories of the patients already in the hospital to estimate the hospital's bed occupancy for the planning horizon. Later, the admission of each actual patient is decided according to the occupancy that was obtained by simulation. If the hospital seems to be able to assure the services required by the patient's trajectory for the whole patient's sojourn, the patient is accepted. Otherwise, the patient is rejected.
Notice that P3 uses archetypes to ''predict'' patients' trajectories that can be different from those of the actual patients, so errors in the occupancy can be made, as it was discussed in the previous section.
A hospital with 230 and 20 hospitalization and ICU beds, respectively, was simulated using 5 instances randomly chosen from the testbed presented in Section 4.1. The numerical results are reported in Table 3 which gives, for each admission rule, the average number of Admitted and Refused patients. Notice that an arriving patient can be refused because there is no available bed at the hospital (No capacity) or because the admission policy says so (Rule decision). Table 3 also reports the number of Blocked patients as well as the average occupancy for hospitalization and intensive care beds. Also, it is important to mention that since there were no blocked patients in the data used to fit the archetypes, we were not able to model how a blockage affects the patient's trajectory, or the time spent in the services. Therefore, during the present experiments, if a patient is blocked waiting for a intensive care bed, then we count the patient as blocked and the patient keeps its current ward bed for the time is supposed to be at intensive care. If an intensive care bed becomes available, the patient moves to it to complete its remaining length of stay at intensive care. By doing so we ensure that the ward beds occupancy is not underestimated.
Policy P1, as one might expect, accepts the largest average number of patients (1409) but has to reject in average 91 arriving patients because the hospital was full at their arrival. On the other hand, in average 139 of the admitted patients will be blocked, so they will not receive ICU services when they require them.
Policy P2 reduces the number of blocked patients with respect to P1, but at the price of refusing more patients. Unsurprisingly, as the acceptation threshold is set to lower values, both the number of admitted and blocked patients decreases.
Finally, P3 reaches an excellent compromise between the number of accepted (1280) and blocked patients (16). Indeed, under policy P3, the hospital provides full treatments to 1264 patients in average, and diverted 220 at their arrival, making bad decisions (i.e. accepting a patient who will not receive full care) in only 16 cases. If we compare those results to those produced by policy P1, it is possible to conclude that although P1 and P3 served correctly almost the same number of patients (1270 and 1264 respectively) P3 was able to anticipate and divert patients for whom the hospital would not have enough capacity to other hospitals, while P1 accepts patients without questioning their needs resulting in a high number of patients unable to receive intensive cares. Table 3 confirms also that P1 reaches the highest average occupancy for both hospitalization and intensive care beds. Nonetheless, this is not a surprising result because, by definition, P1 accepts more patients than P3. Indeed, P1 accepts 93.93% of the arriving patients while P3's accepting rate falls to only 85.33%. This explains most of the differences between the average occupancy reached by P1 and the other policies.
Finally, these results raise interesting questions on hospitals' capacity management, particularly in a regional context where several hospitals may show different levels of occupancy and the development of coordinated decisions between them might lead to divert the right patient to the right hospital to minimize the number of patients that do not get access to the services they require.

Using the occupancy predictions to decide on the right number of beds
This section inspires by the results reported in the previous section to propose a last numerical experiment where the tool is used to enquire on how to set right capacity to hospitalization and ICU beds. Table 3 shows that all the policies but P3 led to large differences between the hospitalization and the intensive care occupancies. Indeed, one of the questions that hospitals' managers faced during previous COVID-19 waves and that, to the best of our knowledge, has not been addressed yet by the literature concerns the number of intensive care beds that the hospital should have with respect to its number of hospitalization beds. For instance, the Mackenzie Health's Cortellucci Vaughan Hospital reported 1 an increase of 35 intensive care and 150 hospitalization beds to cope with COVID-19. After these additions, this hospital reached a ratio between the number of intensive beds and the number of hospitalization beds of 23%, which is much higher than the rates that are usually observed in general hospitals. To which extend these increases were right and how managers figured out the number of beds to add? The next paragraphs propose an empirical approach to shed some light on this question and, more precisely, to investigate how hospitals could adequately balance the number of hospitalization and intensive care beds.
To this end, we submitted several scenarios of patient arrivals to our tool and, assuming no restriction on the number of available beds, we recorded the largest occupancy, both at hospitalization and ICU units. Doing so, we ensure that each patient received the required cares. To better control the number of incoming patients, we modeled their arrival as a Poisson process with parameter k, where k is the number of patients arriving at the hospital per day. Then, parameter k was changed to assess how the bed requirements were affected by the arrival rate. Experiments were run for values of k ranging from 24 to 192 patients per day (1 to 8 patients per hour), and for each value, 20 replications were executed. Fig. 4 shows the average number of hospitalization (Hosp) and intensive care beds (ICU) required to fulfill patients' needs (primary y-axis) for different patients' arrival rates. The secondary y-axis shows the ratio between intensive care and Hospitalization beds (denoted ) for each of the considered arrival rate. It can be observed that, unexpectedly, the value of remains fairly constant for a very large range of arrival rates. Indeed, when the arrival rate increases from 24 patients/day up to 192 patients/day, decreases rather linearly by only 2.18% (from 18.60% to 16.42%). Fig. 4 also shows how the number of required beds grows fast with the arrival rate, but this growth seems, as it was the case for , linear both for hospitalization and intensive care beds. More importantly, Fig. 4 gives an empirical bound on the numbers of beds required to provide adequate services to patients for a given patients' arrival distribution. Using the tool, managers can propose different distributions of patients' arrivals to estimate the number of beds required for various scenarios, and that using a minimum amount of real data.

Conclusions
Although a huge effort has been devoted to forecast COVID-19 dissemination in population, considerably less research has tried to assess the impact of COVID-19 on capacity management at the hospital level. This paper presents a data-driven simulation tool to support decisionmakers managing the capacity of hospitals to ensure adequate access to COVID-19 patients. The tool encompasses a discrete event simulation approach based on archetypes extracted from empirical analysis of patient trajectories. This data-driven approach used to identify resources usage is one of the strengths of the tool because it allows, by using a rather small amount of data, fit the archetypes to actual trajectories observed in any region and to adapt to the particularities of current and forthcoming variants. In addition, the ability of discrete events simulation to represent the sojourn of individuals at a specific hospital makes the tool able to capture the effects of strategies and processes applied at any particular hospital and the characteristics of their respective population using a reasonable amount of data. Numerical experiments run on realistic instances allow to validate the accuracy of predictions produced by the tool and then illustrate the tool can add value to the decision-making process. In particular, we show how the tool can help mangers take informed decisions regarding the admission of patients at a given hospital. Finally, a last set of experiments demonstrate how the tool can be used to estimate the right number of hospitalization and intensive care beds. In particular, empirical results show that the ratio between the number of intensive care beds and the number of hospitalization beds remains fairly constant for a very wide range of arrival rates. This research has some limitations. Firstly, as it was mentioned earlier, the nature of the data produced by the generator did not allow us to compute the patients' outcome. However, it could have been easy to separate archetype 2 into two separated archetypes according to the patients' outcome (recovered or dead). Secondly, since the realistic instances that were analyzed do not include any blocked patient, we were not able to model what would happen during the simulation to patients that do not receive the care they require and how their medical situation might deteriorate. Indeed, we simply identified patients that were blocked and kept them at their current beds until a bed in the service they require become available. Thirdly, archetypes produced from historical data represent past patients but need to be updated regularly to account for changes in the disease, such as new variants, or new medical treatments. These limitations lead us to conclude that, to be deployed in a real-life context, both additional data and more sophisticated approaches (i.e. clustering methods) should be required to monitor and adapt the archetypes using data gathered in real time.
Finally, let us insist on the fact that the tool does not include any particular model for patient arrivals, which are considered as exogeneous. Indeed, as discussed in the literature review section, a number of epidemiological models to predict COVID-19 spread and therefore arrivals at hospitals have already been proposed in the literature. Nonetheless, the tool can anticipate occupancy levels independently of the patient arrival pattern so managers can consider various models in their decision-making process.