Network memory in the movement of hospital patients carrying antimicrobial-resistant bacteria

Hospitals constitute highly interconnected systems that bring into contact an abundance of infectious pathogens and susceptible individuals, thus making infection outbreaks both common and challenging. In recent years, there has been a sharp incidence of antimicrobial-resistance amongst healthcare-associated infections, a situation now considered endemic in many countries. Here we present network-based analyses of a data set capturing the movement of patients harbouring antibiotic-resistant bacteria across three large London hospitals. We show that there are substantial memory effects in the movement of hospital patients colonised with antibiotic-resistant bacteria. Such memory effects break first-order Markovian transitive assumptions and substantially alter the conclusions from the analysis, specifically on node rankings and the evolution of diffusive processes. We capture variable length memory effects by constructing a lumped-state memory network, which we then use to identify individually import wards and overlapping communities of wards. We find these wards align closely to known hotspots of transmission and commonly followed pathways patients. Our framework provides a means to focus infection control efforts and cohort outbreaks of healthcare-associated infections.


Introduction
Antimicrobial resistance (AMR) poses one of the greatest risks to human health (Prestinaci et al. 2015). Currently, around 700,000 people worldwide die from infections with resistant pathogens each year, and this number is estimated to rise to up to 10 Million by 2050 (Interagency Coordination Group on Antimicrobial Resistance 2019; O'Neill 2016). Hospitals and other healthcare facilities act as key vectors for the spread of AMR through healthcare-associated infections (HAI) (Struelens 1998). Persistent colonisation of hospital patients and the networked nature of hospital processes underlying patient mobility will likely cause AMR to remain prevalent (Pastor-Satorras and Vespignani 2001b). Several factors moreover exacerbate the spread of AMR in healthcare facilities, including the selective pressures generated by increased antimicrobial usage, and the large pool of vulnerable patients, who are more susceptible to infections (Organization 2002). The need for infection prevention and control (IPC) can therefore not be understated.
Understanding the transmission dynamics of AMR promises valuable insights to improve IPC strategies. Key to these measures will be the analysis of patient pathways capturing the movement of patients carrying AMR during their hospital stay. Like many real-world systems, healthcare facilities have complex structure, which when ignored can limit the insights into the underlying dynamic processes. In this study we focus on mapping the movement pathways of patients known to carry antimicrobial-resistant bacteria onto physical structures of the hospitals. Specifically, we focus on patients colonised with Carbapenemase-producing Enterobacteriaceae (CPE). CPE is a particularly concerning form of AMR that confers resistance to carbapenems, broad-spectrum antibacterials often used as last-line antibiotics. CPE infections have recently seen a global surge amongst HAIs (Bonomo et al. 2018;Logan and Weinstein 2017).
Networks provide a powerful formalism to analyse the movement of patients in hospitals. Nodes typically represent physical locations within the hospital, such as wards, and edges represent the flow of patients between these locations, with edge weights encoding the volume of patient flow from one location to another. To facilitate analysis, we can aggregate the movements of individual patients into probabilities of transitioning between hospital wards (Donker et al. 2012;Bean et al. 2017). Typically, patient trajectories are broken down into individual transitions between wards: first, the number of transitions between each ward is summed across all patients and subsequently, for each ward the sum of all out-going transitions is normalised to one. The constructed network may then be interpreted as a first-order Markov model, where a random walker transitions with a probability proportional to the observed outflow volume from the current node to others in the network (Salnikov et al. 2016).
This dynamical assumption, whilst useful because of its simplicity and ease of implementation, is however limited by the assumption that transitions between nodes are independent of prior nodes within the patient pathway. Previous studies have indeed shown that first-order Markovian dynamics are not sufficient to fully model network dynamics of disease propagation (May and Lloyd 2001;Pastor-Satorras and Vespignani 2001a). Akin to shipping trajectories or passenger movement between airports, patient movement in hospitals tends to follow particular patterns dictated by medical or operational constraints. In particular, it is plausible that patient trajectories could bear 'memory' , that is, a subsequent move depends on several or all previous locations visited, and not solely on the current location leading to transitive dependence in the data.
Introduced by Shannon (1948), higher-order memory models have shown relevance across a number of applications, and a wide range of real world movement data (Song et al. 2010;Kareiva and Shigesada 1983;Chierichetti et al. 2012;Singer et al. 2014;Gonzalez et al. 2008;Heath et al. 2008) including several epidemiological data sets (Balcan and Vespignani 2011;Poletto et al. 2013). Ignoring such transitive dependencies and modelling patient movement via memory-less, first-order Markov models can distort both network topology and conclusions on the underlying process (Mucha et al. 2010). Despite the clear importance of transitive dependence, to date we only find one study (Palla et al. 2017) of hospital patient movement accounting for these relationships, and none when looking at AMR across healthcare facilities. Hence, in this study we investigate evidence for and implications of transitive dependencies in the movement patterns of hospital patients colonised with a CPE by including memory in our network models.
To model these effects, we use memory networks, which encode the memory of individual trajectories into higher-order transitive relationships, and which have successfully been used to investigate transitive dependence in pathway data (Lambiotte et al. 2019). To provide some intuition behind memory networks, consider a simple example of a small network of a hospital with five wards where the patients can follow one of two possible routes between the wards, and the two routes share one common node (Fig. 1a). A firstorder (memory-less) network model assuming full transitivity (Fig. 1b) would wrongly suggest that a patient starting at v 1 could transition to v 5 with some probability, when in fact, only patients starting at v 2 can reach v 5 . In a memory network (Fig. 1c) these transitive dependencies are captured by abstracting away from a network of physical nodes to a higher-order networks of state nodes representing the possible dynamical states of the system (i.e., the sequence of hospital wards visited up to a given memory) (Edler and Bohlin 2017). For a memory network, the order k, determines the number of previous pathway steps to consider in the state network, acting to parameterise 'memory' . Such memory is Fig. 1 Illustration of transitive dependence encoded into memory network. a Two sets of typical patient pathways, largely independent, but passing through the same ward as an intermediate point in their pathways. b First-order representation of a without any memory (c) Memory network representation a, whereby a physical node network maps to state nodes, which encode transitive dependence of the patient pathways and constrain a random walkers movement directly incorporated into the state network though each state node representing a pathway of length k − 1 present in the trajectories. This state network can be thought of as an additional layer of information still bound to the physical network since each state node is assigned to a physical node. The state network thus acts to constrain how a random walker transitions between physical nodes. These higher-order network abstractions lend themselves to learning tasks that can pinpoint key properties underlying the dynamical process. In the case of HAIs, this can offer insight into more accurate patterns in the movement of infected patients otherwise lost in a network model that assumes full transitivity.
Below we present the analysis of patients pathways confirmed to be colonised with CPE. We begin by presenting our data and a description of the hospital network. We then present evidence for memory within patient pathways by contrasting models constructed with and without memory. Next, we construct a lumped state memory network, which captures transitive dependence and removes redundancy. We carry out multiscale community detection on this network, and present the resultant communities, highlighting specific wards and specialities that are important across different regions of the network. Finally, from a clinical point of view, we discuss how these results can aid infection prevention and control to identify hospital structures that are relevant for disease transmission and thus to focus intervention strategies.

Data
Our analysis is based on anonymised electronic health records of patients from a large 1000-bed Trust of London teaching hospitals. Specifically, we used ward-level movement patterns of 967 patients who tested positive for CPE over a period of two years between 2018 and 2020. We focused on the subset of 526 patients who moved between at least two wards during their hospital stay for a total of 1958 transitions between 96 hospital wards.
Formally, the hospital Trust is structured around 17 specialities and 19 buildings, the latter belonging to three hospital sites (Fig. 2). Hospital site 3 acts as a Tertiary site with only speciality wards. Whilst sites and buildings are constrained by geographical factors, specialities are defined by medical procedures and thus may overlap across sites and buildings. In fact, a number of specialities span all five hospital sites (Critical Care, Elderly Care, Medicine, Private, and Surgery). Geographical structures constrain patient movement to some extent: patients with certain co-morbidities and therapeutic requirements are likely to be constrained to a single or several specialities supporting those needs, whereas other patients can move within buildings, or between wards placed closely for logistics and ease of transfer.

From patient pathways to network models
We consider the trajectories of p patients. Each patient pathway as a trajectory T α and the set of α = 1, . . . , p trajectories is T = T 1 , T 2 , T 3 , . . . , T p . Each T α consists of a timeordered set representing discrete movements between nodes, (1) where each node refers to one of N hospital wards N = ν 1 , ν 2 , ν 3 , . . . , ν N . Since these nodes represent physical locations, we will refer to them as physical nodes to avoid confusion with state nodes, which we introduce next.
In order to understand the aggregate dynamics of all patients, whilst preserving memory effects in T , we represent the trajectories as a memory network as proposed by Rosvall et al. (2014). This way, we maintain information about physical nodes N whilst instilling transitive dependence in the connectivity patterns of an underlying state-network, M k = (E k , S k ) . Here E k is the set of edges that link the set of state nodes S k that capture higher-order memory of order k (Edler and Bohlin 2017).
A memory network of order k = 1 , M 1 , represents a system with zero memory, where the movement of a random walker only depends on its current location. In this special case, the state network M 1 = (E 1 , S 1 ) is equivalent to an aggregated physical network, and the set of states directly maps to the set of physical nodes, i.e., S 1 = N . The edge weights w ij conforming the set E in M 1 represent the frequency of transitions between physical nodes ν i and ν j across the set of trajectories T . Given w ij , we can write the transition probability matrix P 1 for M 1 as In memory networks of higher-order, where k > 1 , state nodes represent pathways of length k − 1 , and are no longer equivalent to the physical nodes S k = N . This representation allows us to introduce the memory dependence in T , capturing multi-step patterns of flow via the state nodes of the network (Salnikov et al. 2016).
(2) In particular, for the second-order memory network M 2 , a state node represents a directed pathway of length one s j = − → ij . For two states nodes s j = − → ij and s ℓ = − → jℓ to be connected, a path of length two, ( ν i → ν j → ν ℓ ), must occur in the set of trajectories T . Similarly for higher-order models, edges between state nodes are weighted w s j s l and capture the number of occurrences that a transition between state nodes s j and s l was observed in T . Transition probabilities P k of M k for any order can be derived from Equation 3 by altering the state node set S to represent pathways of length k − 1 , so Each state node can be mapped to a physical node (Fig. 1a), using an |S k | × N indicator matrix D, the elements of which, D sν ∈ 0, 1 , indicate the final physical node of a pathway s.
We first constructed a first-order memory network M 1 that contains 96 state nodes with a one-to-one mapping to the 96 wards (physical nodes). M 1 consists of four weakly connected components, one of which contains the majority of state nodes (87 out of 96) (Additional file 1: Fig. S1). We next constructed a second-order memory network M 2 that contains 384 state nodes, in 18 weakly connected components. Similarly, M 2 consists of a single weakly component that contains the majority of state nodes (329 of 384). Structurally, M 1 has a higher connectivity with a clustering coefficient of 0.287 and a diameter of 6, whereas M 2 is more sparse with a clustering coefficient of 0.003 and a larger diameter of 31, resembling a series of connected line graphs (Additional file 1: Fig.  S1).

Patient trajectories break first-order dynamics
Using random walks to reveal and probe the structure of networks has long been a foundational tool in network science (Masuda et al. 2017). A random walk is a stochastic process which consists of a succession of random steps with no memory of its past locations; however, in a system where transitive dependence plays a important role, a purely random walk becomes inaccurate and potentially misleading. Higher-order memory networks can capture deviations from first-order transitive assumptions by constraining where a random walker can next move depending on its previous location(s). For pathway models without transitive dependence, a random walker should be no more constrained when moving from a first-order memory network, M 1 , to a second-order memory network M 2 (Rosvall et al. 2014). However, pathways exhibiting transitive dependence will constrain a random walker comparatively more in the second-order memory network. Here, we use the entropy rate of the random-walk to measure the uncertainty of moving between two state nodes (Shannon 1948;Schaub et al. 2012): where π denotes the stationary distribution across M , and p(i → j) are the transition probabilities. A higher entropy rate reflects a higher uncertainty in the transitions of a random walker (Schaub et al. 2012). If a large amount of memory is important, we expect a decrease in the uncertainty of the random walker when accounting for the higher-order effects, since a random walker becomes more constrained. On the other hand, if memory plays little role, and does not constrain the transitions of the random walker, we would expect little change in entropy when accounting for memory.
We constructed memory networks M k of order k = 1, 2, 3, 4 (description of the number of state nodes, edges and pathways for each M k is detailed in Fig. 3a). Computing the entropy for each M k we find increasing restriction of the random walk (reduced entropy) for larger k (Fig. 3b). In particular, we observe a large decrease in entropy from 2.70 to 0.57 when we move from first-order memory to second-order memory. Patient pathways with little to no memory effect would not exhibit any large change in entropy when moving from M 1 to M 2 and thus our results suggest that there exist patient pathways which break first-order Markovian transitive assumptions and highlight the importance of capturing memory. Now we must determine the optimal order k for a given analysis. For small data sets, it is difficult to statistically validate whether memory networks with higher-order are relevant, given that the parameter space and complexity increases exponentially (Scholtes 2017). A common workaround is to use cross-validation, a model validation technique borrowed from machine learning (Singer et al. 2014). In cross-validation, data is partitioned and performance is determined as an average across partitions to reduce overfitting and selection bias (Arlot and Celisse 2010; Cawley and Talbot 2010). To perform cross-validation in the framework of a memory network we compute the rank orders of wards using a training set of patient pathways and then compare with the rank order of wards generated from visitation probabilities of a withheld partition of patient pathways. Similar to Rosvall et al. (2014), we used a generalised PageRank for higher-order models where the visitation probabilities of state nodes were summed for each physical node (see methods section: "Higher-order PageRank" section). The rank orders between train and test sets were compared with Kendall-Tau rank correlation (Kendall 1938) and the results were averaged over a fivefold cross-validation split. We found that M 2 was more predictive of the node ranking of physical nodes than M 1 (0.60 to 0.49) (Fig. 3b). Moreover, across all folds the ranked correlation was more significant in M 2 than M 1 (Additional file 1: Table S1). This increased performance in M 2 again suggests that a patient's current and previous location both affect future movement, and that accounting for this memory effect yields more accurate approximations of patient movement. Whilst further higher-order memory effects may exist, we were unable to detect any increased predictive power beyond the second-order (Fig. 3b). We note that this may be due to limitations of our data; as we increase the order k, we must discard additional patient pathways with fewer than k transitions between wards. This is evident in Fig. 3a that shows a decreasing number of pathways as we increase the order; and whilst the number of state nodes and edges initially increases from the first-order to the secondorder, as you may expect by increasing model complexity, due to the decreasing number of pathways we instead observe a decrease in the number of state nodes beyond the second-order. Herein, to retain enough patient pathways for reliable insights, we thus shall focus on the second-order memory network.
We then compared the PageRank of physical nodes (wards) between M 1 and M 2 (Fig. 3c). Whilst we found the PageRank of wards in M 1 and M 2 to be correlated ( σ = 0.81 , p val<0.01), there were a number of key deviations. In particular, we find three renal wards (Renal 1, 2 & 3) with a relatively higher ranking in M 2 , indicating that CPE patients frequently visit these wards. These results are reassuring, given that patients undergoing renal therapies at our institution were previously noted to have an increased risk of CPE acquisition (Otter et al. 2020), though whether this is unique to our institution or a more specific trait of this patient group is to be determined. The higher ranking of these Renal wards in M 2 highlights the importance of using a constrained state node network to understand the clinical movement of these patients. Conversely, Medicine 13 was the highest ranked ward in M 1 , but was found to have a relatively lower rank in M 2 . Medicine 13 is an acute medical admissions unit, and as such acts as the entry/ re-entry point for many patients to the hospital, rather than a transition ward or a ward which offers care, and whilst it plays a starting role in many patient pathways, it is seldom observed at any other point in a patients trajectory through the hospital.

Investigating memory effects with a discrete diffusive process
One way we can study the effect of memory is through the direct observation of its influence on a diffusion process starting at various points in the network (Lambiotte et al. 2019). Figure 4a, b displays the evolution of a discrete-time diffusive process for M 1 and M 2 , each encoded by their respective transition matrix P k , when injecting an impulse at a single ward (Medicine 13). At time t = 0 , the diffusive process is entirely contained within the state node(s) corresponding to Medicine 13 (beyond the first-order, where physical nodes can have several state nodes, we share initial probability over states based on the frequency of pathways in T that s j = − → ij represented). For times t > 0 we compute the probabilities of being on a given ward at time t through powers of the transition matrix P t k .
After a single discrete step t = 1 we find there is little effect of memory with the total number of wards reachable being similar for M 1 and M 2 (12 wards vs 9 wards, respectively). However, as we extend the diffusive process to t = 2 and t = 3 we find that the number of reachable wards from Medicine 13 increases rapidly for M 1 (36 wards at t = 2 , then 71 wards at t = 3 ) whereas we do not see any change in M 2 (9 wards at t = 2 , and 9 wards at t = 3 ). In fact, for M 1 a random walk initialised at Medicine 13 can reach 71 out of the 79 wards within the largest weakly connected component in T after just 3 steps. This level of transitivity is not present in T , and its absence is directly observable by looking at the restriction of flow evident in M 2 (Fig. 4a, b). This difference comes from patients not starting at Medicine 13, but passing through its neighbours influencing the 2-step network transitivity.
Interestingly, M 2 constrained walkers such that no backtracking to Medicine 13 is possible over the first 3 discrete transitions, in contrast to M 1 , where backtracking to Medicine 13 is possible for t > 1 . In fact, using M 1 there is a relatively large probability to revisit Medicine 13 after 2 or 3 steps ( p 2 med13 = 0.18 and p 3 med13 = 0.24 ). Given that Medicine 13 is commonly an entry point/readmission point where patients go when waiting for diagnosis, we would expect a minimal backtracking effect in patient movement across short time frames since they move into subsequent specialities for treatment once a diagnosis is known. Hence, including memory through M 2 better captures true patient flow. Medicine 13 Medicine 13 Medicine 13

Fig. 4
Memory effects on network pathways. a, b A discrete random walk starting at Medicine 13 in the state networks of M 1 and M 2 , modelled as a discrete time diffusion over the transition matrices P 1 and P 2 respectively. After a single discrete time step t = 1 the reachable number of wards is similar between M 1 and M 2 , however, they quickly diverge as the diffusion evolves over time and the transitive effects increase in M 1 . c, d Median size of the reachable set of wards for M 1 and M 2 : c Overall reachability after t-discrete steps, and d reachability after t-discrete steps broken down by speciality (in the largest connected component)

Forward reachability is varyingly constrained by memory
We expanded the above framework to examine reachability across the entire network by performing the analysis for every possible starting node. For each ward, we compute the set of reachable wards after t time-steps and in Fig. 4c we display the median size of reachable sets for all wards under M 1 and M 2 . Similar to the analysis of Medicine 13 in Fig. 4a, b, we find that the median size of reachable sets is relatively similar between M 1 and M 2 at t = 1 . However, as t increases we again observe divergence in the reachable set sizes due to the significantly larger set of reachable wards in the first-order model M 1 . Indeed, after 3 time-steps only 5 wards are reachable on average under M 2 as compared to the 79 reachable wards under M 1 . Hence M 1 is inflating transitivity between wards and distorting the set of reachable wards for a patient through inherent ignorance of prior ward visits. We also observe that the variance of the reachable set of wards for M 1 increases for t = 2, 3 , suggesting that the importance of memory is different depending on the ward at which the diffusive process was initialised.
To study this, we next break down wards by speciality and examine the importance of memory on the median set size of reachable wards. Figure 4d summarises the size of reachable sets averaged across wards within the same speciality. We notice that specialities which are known to be well visited by CPE patients in this hospital setting (e.g., Critical Care, Renal) exhibit a comparatively larger reachability set size when compared to the aggregated view in Fig. 4c. In contrast, specialities such as Neurology or Cancer which are less common to CPE patients exhibit a relatively lower reachability. These different reachabilities between specialities likely result from two mechanisms: (1) the different roles specialities play within the network and their transitivity by CPE patient trajectories, and (2), that memory effects may vary in different areas of the network, i.e. the extent to which a previous ward determines a patients next move. Hence, it may be optimal to construct a 'hybrid' of M 1 and M 2 which incorporates many of the desirable memory effects in M 2 , but simplifies parts of the model where greater transitivity is in fact present.

Reducing complexity using state node lumping
Given a large set of trajectories, the problem arises that state node networks M k can become increasingly large and often duplicate or contain redundant information. In the case of patient trajectories, not all hospital pathways may exhibit memory effects in equal measure. Variable-length Markov models, pioneered by Rissanen (1983) alleviate some of these issues by introducing a 'lumping' step in which 'redundant' states are merged, thus enabling models to capture variable lengths of memory and remove model redundancy (Jääskinen et al. 1983;Bühlmann and Wyner 1999). Remembering that in memory networks, state nodes are assigned to physical nodes, we will often find several state nodes that are connecting the same physical nodes just via different edges. There is no need for this repetition and therefore here we focused on lumping state nodes within the same physical node to form so called 'meta-state nodes' or 'lumped nodes' which also benefit from preserving the physical network structure (Lambiotte et al. 2019). For each lumped node, we reassemble all connections between two states nodes such that weighting and connectivity are preserved (Edler and Bohlin 2017). In effect, 'lumping' nodes retains relevant and distinct patterns of transitive dependence in the original pathways; however, for our purposes it also serves to 'de-sparsify' M 2 , improving its practicality and making it useful for subsequent learning tasks that assume greater connectivity.
In our approach, we lump together state nodes based on the similarity of visitation probabilities computed from a discrete diffusive process encoded in the state transition matrix P k over t-steps. Existing node lumping methodologies use a 1-step random walk to identify state nodes that have similar connectivity within the network (Edler and Bohlin 2017;Persson et al. 2016). Here, however, we extend this approach to t-steps to identify similarity across a greater network locality. Using an agglomerative clustering method on the discrete diffusive process, we can then identify state nodes with similar connectivity, and if both are members of the same physical node they can be lumped together (Hastie et al. 2009). Importantly, this agglomerative clustering is parameterised by a clustering rate r, which controls the total proportion of state nodes to lump (for a detailed explanation refer to methods section: State lumping on local connectivity).
To what extent should we lump state nodes together? At one extreme, we have the state node network M k without any lumping and at the other extreme we have the physical node network where every state node has been lumped together within its respective physical node. We want to identify an optimal amount of lumping, comfortably between the two extremes, that retains transitive dependence but removes redundant or duplicated information. The resulting lumped network is denoted M k . In order to quantitatively determine the optimal lumping, we used 'ground-truth' community structures such as buildings, specialities, and hospital sites and compared these annotations with the results of community detection on the lumped network M k . Whilst these structures do not fully constrain patient movement and therefore cannot provide an exact ground truth, there does exist a correlation with patient movement. We hypothesised that the optimal lumping would be found at the elbow of a fitness curve generated from the ability to detect known hospital structures in community structures, thus providing a tradeoff between model accuracy and simplicity. Accordingly, we found that a clustering rate of r = 0.35 gave the optimal lumped model (Additional file 1: Fig. S3).
The lumped network M 2 contains 171 state nodes across 7 weakly connected components. Similar to the state node networks M 1 and M 2 , we found a large weakly connected component that contained the majority of state nodes (156 out of 171) (Fig. 5).

Fig. 5
From first-order network to second-order network and everything between. The first-order network M 1 (left), the lumped second-order network M 2 (middle), and the second-order state node network M 2 (right) ordered by scale of model complexity Aside from visually appearing to exist in a state between M 1 and M 2 , both its clustering coefficient (0.054) and network diameter (11) sat comfortably between M 1 and M 2 , serving to validate its balance of complexity, connectivity, and higher-order dependencies. Note that unlike M 2 , the lumped state network M 2 no longer resembles a series of lines graphs, and hence provides a more practical structure over which to apply community detection.

Community detection reveals overlapping clusters of wards common to distinct pathways
By constraining a walkers movement within the connectivity patterns of M k , for k > 1 , we can identify communities within M k that conserve flow from a dynamical perspective. Given that M k is composed of state nodes, the memory-dependent structure C will provide network partitions that shed light into community structure. Here we use Markov Stability (MS), a quasi-hierarchical community detection algorithm that identifies regions within a network in which a diffusive process becomes transiently constrained (Delvenne et al. 2010). MS exploits diffusion dynamics over an underlying graph structure to reveal multi-scale community organisation and their stability across time scales (see methods: Dynamical community detection).

The quasi-hierarchical community structure of the wards
Continuing with the lumped state network M 2 , we apply MS and in Fig. 6 we show an apparent hierarchy of state node assignments to community partitions across Markov time t. We selected three points across Markov time ( t 1 ,t 2 ,t 3 ) that exhibited robust community partitions (Additional file 1: Fig. S5). At longer time scales MS reveals coarser community partitions which show significant correspondence to hospital sites (Fig. 6). Specifically, at t 3 each cluster in the 3-way partition strongly corresponds to one of the three hospital sites. If we extend to even longer t we identify a 2-way partition where two hospitals are grouped almost exclusively into a single community (Additional file 1: Fig. S7). Notably, the hospital with wards grouped separately is the Tertiary site within the hospital trust which consists of speciality wards and appears to share fewer patients with the other two hospitals.
Moving towards shorter t within the MS analysis, which are expected to identify more granular structures of patient flow, we identity sub-structures largely contained within hospital sites, which overlap to a lesser extent between hospital sites (Fig. 6a). In some cases, these confer to buildings (we find 10 buildings that are over-represented in clusters at t 1 ), in other cases these confer to specialities (we find 7 specialities over-represented in clusters at t 1 ). Focusing initially on speciality, we find three specialities (Haematology, Cardiology, and Renal) that are over-represented within separate communities suggesting they are have a high degree of within speciality patient movement (Additional file 1: Fig. S8). However, as we increase t to reveal coarser partitions we see the more granular communities combine, bringing together previously distinct specialities such as Haematology or Renal into coarser partitions with other specialities, highlighting the zooming affect of MS as we change the t at which communities are observed. However, it is clear that the community structure is not entirely defined by specialities and the physical constraints imposed by buildings, hospitals, and common movement patterns play a significant role and result in our observed communities (Fig. 6b).
Given that the majority of patients will move between specialities at some point during their journey through the hospital, it is expected that communities would not correspond exactly to ward specialities. Often this is attributable to patients seeking treatment for comorbidities, additional to their primary condition. The effect of such movements is a memory effect whereby patients will transition back to wards treating their main condition. In fact, several specialities primarily service these secondary conditions. An example is Medicine, a general class of ward that, as well as taking admissions also offers general treatment and support. Critical Care is another example with high expected memory effects, since it services patients from any ward if they deteriorate fast enough. Notably, we find that wards both in the Medicine and Critical Care specialities can be found within 10 different communities at t 1 . Additionally, Surgery, another department that services multiple other wards, can be found in 9 different communities.

Overlapping community assignments
Although community detection generally is generally used to find 'disjoint' communities, multiple community membership is a well observed phenomenon, whereby a node may have multiple functions that it shares with different groups of nodes (Xie et al. 2013). Understanding that we are essentially clustering wards based on the movement patterns of patients, it is likely that different cohorts of CPE patients (e.g. with different comorbidities) have overlapping pathways. For instance, different cohorts of patients still require a set of common services and hence visit an overlapping set of wards (e.g. for admission, surgery, critical care, or renal dialysis). This phenomenon is well captured by memory networks, standard methods of community detection applied across the state network are able to reveal overlapping communities of nodes on the physical network. Additionally, the notation of granularity introduced by MS adds an interesting dimension to this problem, whereby the degree to which wards overlap communities can depend on the point Markov time. We can thus identify hospital wards which persistently overlap multiple communities across both granular and coarse time scales. These wards are of particular interest when developing Infection Prevention and Control strategies as they can play the role of network bridges and potential transmission hotspots.
At the most granular time scales, we find 48 wards with multiple community assignments (Additional file 1: Tables S2-S4). With increasing Markov time the total number of overlapping wards decreases; however, there exist several wards which are persistently overlap communities. We find 4 Renal wards and a single Elderly Care ward which have membership within each community of the 2-way coarse partition. Despite disappearing in the very coarser 2-way partition after t > 12 , Critical Care, Medicine, and Surgery, as well as a single Elderly care ward also overlapped between communities. Since the most coarse partitions strongly corresponded to non-specialist hospital, and specialist hospital sites, it is likely that Critical Care and the Elderly care wards still play a strong connective role within connecting the two non-specialist hospital Sites 1&2.

Identifying the most central wards
In the previous section we identified nodes that were assigned to multiple communities, highlighting their critical role in the pathways of multiple cohorts of patients with differing patterns. We also used PageRank to identify the importance of wards in M 1 and M 2 .
To allow for a more complete examination of ward importance, and to provide further investigation of M 2 , we use Multiscale Centrality (MSC). MSC is a measure of centrality that enables us to identify nodes that are important in the network at different scales . Following the same approach to compute centrality of the physical nodes, we compute MSC for each state node and then compute the sum of state node centralities across each physical node to generate a value of MSC for each ward. Figure 7 shows the results of MSC computed for M 2 . We find several wards that are central at all scales, implying that they are both highly connected locally (short time scales), and also important as global connectors/bridges (long time scales). Both Medicine 13 and 14 appear as central at all time-scales; Medicine 13 and 14 are both admission and readmission points into the hospital, where patients will be first identified as positive for CPE, and where they will return if readmitted. Additionally, we find four renal wards are central at all scales; note that three of these renal wards (Renal 1, Renal 2, and Renal 3) were also ranked highly in our previous analysis of M1 and M2 with Pagerank. Conversely, we also find wards that vary considerably in their importance across time-scales (Fig. 7b).

Conclusions
Analysis of patient movement can give valuable insights to understand disease dynamics and inform Infection Prevention and Control. Here we examined the common assumption of memoryless-ness in the movement of patients. To this end, we employed network analysis and compared networks with and without memory.
Models with memory have a substantially larger parameter space. We therefore constructed a simplified memory model based on a hybrid 'lumped' memory network, which retains the effect of memory in the patient trajectories but removes redundant or duplicate information. Such hybrid models are particularly useful for networks which exhibit different levels of memory depending on the localities. In this context, we extended previous work on lumping in memory networks in two ways: firstly, we defined a node feature vector that allows state nodes to be compared and lumped into meta nodes based on longer random walks; secondly, we proposed that lumping could be optimised by using prior knowledge with known communities which partially constrain patient pathways. This framework for constructing lumped memory networks is generalisable to both other hospital sites and other types of pathway data, where the underlying characteristics of the system predetermine many routes of movement. In the hospital context, patient trajectories are constrained by common pathways of patient treatment and care.
The lumped memory network formed the basis of subsequent investigation to detect communities of movement within our healthcare network. We used prior knowledge, including the hospital structure and specialities, to optimise the rates of Note that at short scales, some nodes will have not been assigned a Multiscale centrality value and so share the bottom rank lumping based on the Markov Stability graph clustering framework. As a result, we could highlight clusters of patient movement with higher-order memory and identify wards that appeared in multiple communities. The communities of patient movement divided the hospital sites quasi-hierarchically into sub-communities of wards that share patient flow. We found correspondence between community structures and known structures, such as hospital buildings or specialities; yet the communities also result from common pathways specific to certain cohorts of patients amongst this hospital group.
The higher-order network framework was specifically applied to analyse a large data set of CPE patient pathways collected in three large London hospitals. The analysis showed that the movement of hospitalised patients colonised with CPE displays substantial memory, i.e., ward transitions depend on previously visited wards. The presence of memory was identified by comparing node rankings with differing degrees of memory, as well as the statistics of a diffusion process on the resulting network models. Notably, we found that including memory in the network model increased the centrality of wards that are known clinically to be commonly visited by CPE patients (e.g. Renal) and decreased the centrality of wards that are less clinically visited amongst CPE patients (e.g. Paediatric). We note that the effect of memory can also be studied by analysing differences in the distribution of standard node centrality statistics (Additional file 1: Fig. S11). Memory also greatly affected local reachability. For example, the memory-less first-order model wrongly implied that patients could reach almost any ward within three ward moves after first entering the hospital. Our analysis, on the other hand, shows that accounting for where patients had previously been, dramatically restricts the possible set of subsequently visited wards. Hence, ignoring pathway memory in hospital patients can affect the outputs of commonly used network analytics tools, and potentially misrepresent the importance of hospital wards.
Understanding the constraints of patient movement can aid IPC. We showed that the ranking of wards and the likelihood of infected patients visiting particular wards was more accurate in the memory network than the memory-less network. In particular, we found that by extending beyond 1st-order memory, wards known to be associated with CPE were ranked more highly. Pinpointing at-risk wards is critical to focus IPC efforts and prevent transmission. Identifying communities of patients with distinct movement patterns, moreover, is valuable to cohort outbreaks within these communities and to prevent spread to other communities. Indeed, we found that the overlaps between communities revealed wards visited by almost all CPE patients (Renal wards) and wards visited commonly by the general patient population (Medicine, Surgery, and Critical Care wards). Such wards are prime targets of enhanced prevention efforts to reduce transmission.
Overall, our study highlights the role of memory in patient pathways. Most current analyses of patient pathways assume memoryless-ness as a first approximation. Here, we showed that ignoring memory may misidentify potential hubs of disease transmission. Our study gave some indication for memory beyond the second-order, however, we were limited in its detection due to the need of increasingly larger datasets. Future work incorporating higher-order effects may therefore give further insights into the precise nature of memory in patient movement. Our analysis suggests that informing policy based on traditional memory-less networks can miss key characteristics of movement patterns. For IPC this can mean missing transmission hubs and wrongly directing screening efforts to less critical locations, resulting in poor use of resources and lower efficacy. Our lumped memory network thus provides a framework for future patientpathway analyses aimed at improving containment of CPE, and may be generalised to inform infection prevention and control of other HAIs.

Higher-order PageRank
PageRank is a measure of node importance or centrality within a network based on the incoming edges (Page et al. 1999). To obtain Higher-order PageRank we follow the derivation presented by Rosvall et al. (2014). PageRank is essentially computing the visitation probabilities to nodes over a network, determined by connectivity and weighting of those connections. In the context of a memory network, one can simply derive PageRank over the underlying state network for a model of arbitrary order k, and project the visitation probabilities back onto the physical nodes.
Firstly, we define the probability of finding a random walker on a given state node s at time t + 1 as where as before a state confers to a pathway of length k and transition probabilities are encoded by the transition matrix P. Now, for any order k the higher-order generlisation of PageRank is simply the stationary solution to equation 5: With π(s j ) it is then trivial to return the physical node PageRank by summing over a physical nodes states:

State lumping on local connectivity
Given a large set of trajectories, the problem arises that state node networks M k can become very large and often contain redundancies. Not all pathways exhibit full transitive dependence, so it can often be desirable to reduce the model complexity by lumping together redundant state nodes. Redundancy of state nodes can lead to over-fitting when a physical node contains a number of similar states. Hence, we focus on lumping states nodes within the same physical node, forming so called 'meta state nodes' which also benefit from preserving the physical network structure (Lambiotte et al. 2019). For each lump, we reassemble all connections between two states nodes such that transition probabilities and connectivity are preserved (Edler and Bohlin 2017). In effect, 'lumping' state nodes together reduces the model complexity whilst retaining the transitive dependence of the original pathways.
In our approach, we lump together state nodes based on the similarity of visitation probabilities of the physical nodes. To do this we use the S × S state transition matrix P over k-steps and then sum the probabilities over the state nodes that compose each physical node. In the construction of P we add weighted self loops equivalent to a nodes total outflow weight w s i s i = s i w s i s j to derive P ′ with Eq. 3. This self loop conserves local flow across P ′ , emphasising local connectivity when we subsequently determine distances across X.
We define the state node to physical node transition matrix X as the visitation probabilities of each state node to each physical node over k-steps, X = P k D , where P is the state node transition matrix and D is the S × N state node to physical node indicator matrix. Each entry x ij corresponds to the probability of transitioning from state node i to physical node j and thus provides a mapping from the higher order state node network to the physical node network. Here, we set k = 3 to incorporate a slightly larger range of local connectivity than previous works that use k = 1 (Edler and Bohlin 2017; Persson et al. 2016).
State nodes with similar local connectivity will exhibit similar probability distributions on the physical node network, therefore we can compute a similarity matrix between state nodes by computing the Wasserstein distance (Villani 2008) between vector rows of X which measures the distance for moving from one probability distribution to another. The similarity matrix was subsequently clustered using an agglomerative clustering method for lumping state nodes within physical node (Hastie et al. 2009).
In order to control the degree to which state nodes are lumped we employed a clustering rate r, which sets the number of final lumped state nodes that should be constructed for each physical node after completion of the lumping procedure. For example, lets consider a scenario where we have two physical nodes, one of which is composed of 10 state nodes and the second is composed of 20 state nodes. If we set the lumping rate r = 0.2 , then after lumping the first physical node would have 2 lumped nodes after the procedure whereas the second physical node would have 4 lumped nodes. Increasing the lumping rate to r = 0.8 would mean physical nodes retain more of their states after the Fig. 8 State Lumping Example for a single physical node ( ν 1 ). Two possible lumpings M ′ and M ′′ are visualised here over the state nodes (grey nodes) with the physical node mapped over (dotted circles surrounding state nodes). Here each lump merges the two most similar state nodes based on feature vectors capturing local visitation probabilities of k = 2 network steps lumping, and for our example would result in those physical nodes having 8, and 16 final state nodes respectively.
Consider a simple illustrative lumping example in Fig. 8 which demonstrates the lumping process for a single physical node (circle of red dashed lines, ν 1 ) and its constituent state nodes (grey circles within the red dashed circle) for different values of k. For the case k = 1 (see M ′ in middle of Fig. 8) only the nearest neighbours of each state node are considered and as such s 1 and s 2 will be lumped together first. The next lumping of state nodes is unclear given that both s 3 and s 4 have 1-step neighbors states in different physical node. However, as we increase k we explore more of the local network and at k = 2 , in this example, it becomes clear that s 3 is more similar to s 1 and s 2 . Hence for the second lumping, s 3 is merged with lumped meta node s 1,2 instead of s 4 (see M ′′ in middle of Fig. 8).

Dynamical community detection
Dynamic community detection with Markovian assumptions can still be used to reveal structure in a memory network, simply by applying the same community detection algorithms to the higher-order network structure. M k , for k > 1 , acts to constrain a walkers movement over the physical nodes within its state network connectivity. Hence, if we look for regions across M k that conserve flow from a dynamical perspective, projecting the resultant communities back onto the physical nodes reveals overlapping communities constrained by the transitivity of the state network.
One such example for such a dynamical approach to community detection is Markov stability (MS) (Delvenne et al. 2010), which is the focus for this study. MS exploits diffusion dynamics over an underlying graph structure to reveal a multi-scale community organisation and has been show to be effective in a variety of applications in which multiple scales are expected to exist such as protein sub-structures (Peach et al. 2019a) or social behaviours (Peach et al. 2019b). Given a partition P of nodes into C non-overlapping communities with a N × C community indicator matrix H P the time-dependent clustered autocovariance matrix in MS is given by, where the elements of the matrix [R(t, H P )] correspond to the probability of a random walker starting at node i and ending up in community c at Markov time t minus the probability of that happening by chance.
For an optimal partition P , in which flow is trapped more than one would expect by random over t, we would expect a comparatively large Markov stability With the Markov stability as We aim to maximise r(t, H P ) over the space of possible partitions P at a given Markov time t,  (10) P max(t) = argmax P r(t, H P ).
Whilst the optimisation of Eq. 10 is NP-hard, in practice, heuristics algorithms have been developed which are computationally efficient. Here we use the Louvain algorithm which has has been demonstrated to offer robust solutions at reasonable cost (Blondel et al. 2008).

Identifying partitions of interest over Markov-time
Given a set of partitions that are optimal at each Markov time we must still define which scales are representative or robust in respect to our system. In order to identify partitions of interest over time we look towards two robustness measures. Firstly, we look at consistency of partitions for single points in time, and secondly, we look for stable partitions across time.
To assess this consistency between P at Markov time t we can compute an information-theoretical distance between two alternate partitions P and P ′ is employed: where �(P) is the Shannon entropy, P C being the relative frequency of finding a node in community C in partition P.
Then to quantify consistency at Markov time t we compute the average variation of information of all solutions: For the case that optimisations return near identical partitions V (t) will be small, which indicates robustness of the partition at t. Hence over t we search for partitions with low V (t) .
Relevant partitions should also be remain consist across regions of Markov time. Such persistence is indicated both by a plateau in the number of communities over t and a low value or plateau of the cross-time variation of information:

Multi-scale centralities
For identification of central nodes we use Multiscale Centrality, that enables us to identify nodes that are central at different scales within the network . Multiscale centrality leverages the presence of 'overshooting' peaks that appear in diffusion processes on the graphs. For a more detailed description of overshooting peaks, see . Central nodes are defined as a node, i that breaks the triangle inequality for a pair of nodes j, k, where t ij,τ is the Markov time at which an overshooting peak appears at node j given the diffusive process of an initial delta function at node i which is allowed to diffuse up to Markov time τ.