Evaluating the impact of a simulation study in emergency stroke care

Very few discrete-event simulation studies follow up on recommendations with evaluation of whether modelled benefits have been realised and the extent to which modelling contributed to any change. This paper evaluates changes made to the emergency stroke care pathway at a UK hospital informed by a simulation modelling study. The aims of the study were to increase the proportion of people with strokes that undergo a time-sensitive treatment to breakdown a blood clot within the brain and decrease the time to treatment. Evaluation involved analysis of stroke treatment preand post-implementation, as well as a comparison of how the research team believed the intervention would aid implementation compared towhat actually happened. Two years after the care pathwaywas changed, treatment rates had increased in line with expectations and the hospital was treating four times as many patients than before the intervention in half the time. There is evidence that the modelling process aided implementation, but not always in line with expectations of the research team. Despite user involvement throughout the study it proved difficult to involve a representative group of clinical stakeholders in conceptual modelling and this affected model credibility. The research team also found batch experimentation more useful than visual interactive simulation to structure debate and decision making. In particular, simple charts of results focused debates on the clinical effectiveness of drugs — an emergent barrier to change. Visual interactive simulation proved more useful for engaging different hospitals and initiating new projects. © 2015 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
This paper describes the implementation and evaluation of changes to an emergency stroke care pathway in a large acute hospital within the United Kingdom. These changes followed a discrete-event simulation (DES) study that was undertaken to both identify improvement opportunities and support the implementation of improvement between the clinical stakeholders in the pathway. The aim of the intervention was to increase the research team believed the intervention would aid implementation compared to what actually happened.
The DES literature contains many case studies of computer models that compare alternative policies to identify costs and efficiency savings within industry [2] and healthcare [3]. While these case studies are numerous the evidence that such modelling leads to the implementation of simulation results is lacking. Although not exclusively limited to a particular domain, this lack of implementation evidence has been particularly well documented in systematic reviews within healthcare DES modelling [3][4][5][6][7]. Notably, over the period of 12 years spanning the publication of these reviews only a small number of studies describing the implementation of simulation results in healthcare have been published (e.g. [8,9]).
Evaluations of implementation processes are increasingly conducted in other areas of health services research such as health technology assessment [10] and health program evaluation [11], but are rare in Operational Research (OR). A plausible reason for the apparent lack of implementation accounts and follow up evaluation is the tension between the time needed to implement change within an organisation and the timescale for publication of model results; although it is arguable that such a tension is not unique to DES and OR. One reason that may be specific to OR is the tension between what is seen as legitimate research and what is consultancy [12]. Academics in OR gain little reward for publishing relatively standard models using text book methodology, although implementing results of such models may be of great help to organisations. On the other hand, evaluation research is valuable to the academic community, particularly in the context of increasing recognition of the need to value the positive impact of research in society. Not only does evaluation demonstrate effectiveness or issues with use of methods, but it at its core challenges researchers to revisit and test their assumptions about how they expect a modelling intervention to work [13]. A larger evidence base in the area of evaluation should lead to improved methodology for conducting modelling interventions using methods such as DES.
The contributions of this study are therefore threefold: evidence that the results of healthcare DES modelling interventions are implemented in practice; quantitative evidence that changes recommended by DES can lead to real system improvement and improved stroke patient outcomes; and revised propositions about how simulation modelling interventions aid the changes of implementation.
The paper begins with the background to the simulation study including an overview of the model, expected performance, the changes implemented, and how the research team believed the intervention would work. We then present the results of a quantitative evaluation confirming that the hospital has seen substantial improvement following the study. This is followed by a qualitative comparison of how the research team believed the intervention would support implementation compared to what actually happened. The final section draws together the qualitative and quantitative aspects of the evaluation and assesses the accuracy of how the research team believed the intervention would work. Final comments discuss the need for systematic research into the implementation of results from similar projects.

Thrombolysis for acute ischaemic stroke
Ischaemic events account for over 80% of all cases of stroke [14]. The only licensed treatment for acute ischaemic stroke is thrombolysis with alteplase, a treatment intended to restore blood flow within an artery occluded by thrombus (blood clot). Due to the high metabolic demands of brain tissue, the effectiveness of thrombolysis is critically time dependent [15,16]. The earlier a patient receives treatment the greater the chances of recovery with minimal or no disability, such that the effectiveness of the treatment halves with each 90 min period that passes from onset [17]. As with all drug treatments there are also risks. In this case treatment increases the risk of symptomatic intracranial haemorrhage (SIH: bleeding within the brain), that often leads to death. However, when treatment is given within 6 h of onset, the accumulated evidence shows that the benefit of stroke thrombolysis in reducing disability outweighs the risk of intracranial haemorrhage [16,18].
In Europe, alteplase was originally licensed in 2003 for use within a three hour period from the onset of ischaemic stroke. In that time the patient needs to travel to hospital and be assessed and treated in an emergency department (ED), including brain imaging. Uptake of the treatment has been slow, often because of difficulties with completing the diagnostic process within the short time window, with between 3.5% and 5% of patients receiving the treatment [19]. Efforts to increase this proportion have focused on two areas: randomised controlled trials (RCTs) assessing the efficacy of extending alteplase treatment from three to four and a half hours (or beyond); and public education campaigns to increase awareness of stroke symptoms (e.g. the act FAST campaign in the UK) in order to encourage earlier presentation to hospital with suspected stroke. The benefit of thrombolysis is measured in terms of the increase in the proportion of patients with minimal or no disability at follow-up (usually 90 days), attributed a modified Rankin Scale (mRS) score of 0 or 1. The mRS is an ordinal scale of disability scoring between 0 (no symptoms or disabilities) and 6 (death; [20,21]).

The modelling intervention
Similar to many other hospitals in the UK and elsewhere our hospital treated 4%-5% of all acute strokes annually with alteplase, with the majority of treatment delivered close to the three-hour treatment deadline. The project reported here was initiated in late 2010 as a collaboration between hospital clinicians and medical school academics to investigate the most effective operational changes that could be made to increase thrombolysis rates and reduce stroke-related disability. We chose to use DES to model the stroke pathway as we believed it represented a compromise between the expert and facilitative modes of engagement with stakeholders [22] that others have described as pseudo-facilitative [23]. We chose this approach based on three core beliefs. First, we believed that in order to achieve any agreement on change within the hospital we needed to operate in the facilitative mode of engagement during conceptual modelling [24]; aiding the relevance, transparency and credibility of results to the stakeholders' problem. Second, DES provides the opportunity to use visual interactive simulation (VIS). We believed that VIS would increase the engagement of stakeholders, enabling validation and experimentation, thereby improving the transparency of the model, both of which are prerequisites for effective implementation [25]. Third, we believed that modelling in general would provide a common reference point and structure debate between stakeholders with competing interests. These three hypotheses represent how the research team expected the modelling intervention to support the implementation of the results of the DES study. The final section of this paper reflects on these hypotheses and evaluates if these assumptions were indeed the key factors that aided implementation.

The simulation model
The DES model focuses on the emergency phase of the stroke pathway. Other modelling studies of stroke care and discussions of their potential benefits have been published [26][27][28][29][30][31][32][33], but none provide any details of implementation or health impact. For brevity this section provides a high level description of the model. More detail on model inputs and logic pathways can be found in the online supplementary material (see Appendix A); details to fully replicate the model can be found elsewhere [1].

Model logic
The model is divided into four sections: a pre-hospital phase, an ED phase, a referral phase, and a phase where the acute stroke team (AST) takes responsibility for the patient. The main problem identified at the project hospital was that in many cases the referral phase was delayed. That is, patients with suspected stroke arriving at the hospital were subject to the common delays in triage and assessment experienced in busy EDs and became ineligible for the time sensitive treatment. A key component of the process is emergency brain imaging with a CT scan (during the AST phase). This is used to rule out brain haemorrhage as the cause of stroke.
Two key components of the in-hospital delay to assessment and treatment that were simulated were the time from arrival to CT scan (ATS) and from scan to treatment. A full pathway diagram is included in the online supplementary material (see Appendix A).

Outputs
The model was used to explore the impact of different configurations of the stroke pathway in terms of thrombolysis rates, time to treatment, urgent clinical workload and ultimately patient disability at 90 days. Disability measures were represented as the number of additional patients with a 90-day mRS of 0-1 attributable to treatment (as some patients will recover without treatment).

Experimental factors
The main experimental factors were the paramedic prealert rate (where paramedics phone ahead alerting clinicians to the imminent arrival of a patient with suspected stroke), ED triage referral rate (where triage nurses contact the AST as they encounter patients with suspected stroke), the time window and the likelihood of further contra-indications to treatment, most notably intracerebral haemorrhage. The inclusion of paramedic pre-alerts and early referral at triage were chosen as they provide the earliest possible points for emergency referral to the AST. Paramedic pre-alerts, in most instances, allow for a specialist stroke nurse practitioner to meet patients with suspected stroke at the doors of the ED. The inclusion of contra-indications was not a practical implementation aspect to explore, but it is an important parameter to help quantify the uncertainty in our model outputs.

Summary of model results
Model results were presented to the clinical stakeholders using a standard pairwise comparison approach for scenarios [34], particularly making use of graphical plots to communicate differences. Here we focus on the uncertainty in the model results.
To illustrate the results of the model we include a 2 3 factorial design including two of the early referral parameters and the proportion of exclusions between midnight and 11 am (due to our concern about underestimating the proportion of 'wake-up' strokes where the onset time is unknown). Full results, main and interaction effects are provided in the online supplementary material (see Appendix A).
The range of uncertainty for the overall thrombolysis rate was predicted to be between 7.9% and 14.2% of patients. More importantly, the results demonstrated the critical importance of high compliance with the referral protocol by paramedics and at ED triage. Low compliance in one while the other is high was expected to reduce the proportion treated by 1.1% (95% CI 1.0%-1.3%). As such, the study recommended that the hospital and the local ambulance service implement robust protocols for pre-alerts both pre-hospital and within ED.

Implementation of results
The development of the model and its use took place over six months (January-June 2011). Implementation took place over the following year and was led by the AST with support provided by a simulation modeller. The project timeline is illustrated by Fig. 1. Implementation of the results of the study was phased: from December 2011 (phase 1) stroke patients were referred from ED triage directly to the AST, and from August 2012 (phase 2) paramedics began phoning the AST to alert them prior to suspected stroke arrivals in the ED. In addition to the recommendations of the DES study the hospital also extended the alteplase protocol to treat patients over the age of 80 from May 2012 following the publication of a large randomised controlled trial [35]. 1

Quantitative evaluation
We follow the overview of the simulation study with a quantitative evaluation comparing stroke thrombolysis pre and post-implementation. The post-implementation data cover a period of 21 months. This section summarises the study design (the full protocol can be found in the supplementary and presents the results (see Appendix A).

Study design
Four treatment variables are evaluated using a pre-intervention post-intervention design: the time taken from a patient arriving at hospital to brain scanning (ATS), the time taken from arrival to treatment (ATT), the thrombolysis rate (the number of patients receiving thrombolysis as a proportion of the total number of strokes in the time period) and the number of adverse events including SIH. We also measured compliance with the early referral protocols. Data were collected prospectively from 01/01/11 until 31/08/13. Statistical tests are described in the online supplementary material (see Appendix A).

Participants
Detailed thrombolysis data were available from 2007 until September 2013. We limited the analysis of thrombolysis rates in the pre-implementation period to between 01/01/09 and 30/11/11, as absolute numbers of thrombolysed patients were notably lower prior to 2009. The post-implementation period ranged from 01/12/11 until 31/08/13. In total this provided 2930 cases of stroke (1851 pre and 1088 post-implementation). Fig. 2 illustrates this breakdown in more detail.

Main results
The total proportion of strokes thrombolysed pre-implementation was 4.7% compared to 11.5% post-implementation.

Subgroup analyses
Before the modelling study took place average ATT appeared to be increasing. Average ATT was notably higher in 2010/11 (101 min; 95% CI 90.1-111.2) compared to period preceding 2010 (76 min; 95% CI 67.7-83.5); skewing improvement estimates. We therefore conducted a subgroup analysis limiting analysis of ATT to patients treated between 01/01/2010 and 31/08/2013; giving a difference in average treatment speed of 37 min (95% CI 25.7-48.9) following implementation. Fig. 3 illustrates the thrombolysis rate by half yearly interval starting from mid-2007 (July to September 2013 not shown). When results are limited to phase 2 of implementation (from August 2012) the hospital achieves its highest thrombolysis rates: 14.5% overall, 11.0% excluding patients over 80. ATT times continued to fall in this time period reaching an average of 55.5 min (95% CI 46.5-64.5); a reduction of nearly 50% compared to 2010/11.

Adherence to paramedic stroke referral protocol
A total of 671 patients with a diagnosis of stroke were admitted after phase 2 implementation; 504 of these arrived by ambulance. Cases of suspected stroke are identified by ambulance paramedics using the Face, Arms, Speech and Time (FAST) test. Given that average FAST diagnostic sensitivity has a 95% confidence interval of 76%-85% [36], we estimate that between 383 and 433 of these patients would be identified by paramedics en route to hospital (assuming that a diagnostic test was applied in all cases).
A total of 201 pre-alerts were received post phase 2. Using the estimated FAST positive numbers we converted this figure into an average adherence to the pre-alert protocol of 46% (201/433) to 52% (201/383). These figures are skewed by the gradual effect of dissemination in the months following the protocol implementation. If all 2012 data are excluded, adherence rises to 63%-71%.

Qualitative evaluation
Drawing on our knowledge of the involvement hypothesis in the DES and System Dynamics (SD) literature [37] and experiential knowledge from clinical practice, we developed three hypotheses that reflected our belief in the importance of adopting a facilitative approach and involving clients in modelling.
Hypothesis one draws from the PartiSim (Participative Simulation) framework for healthcare simulation [24]. Involvement of clients from different parts of the system in conceptual modelling is reported to increase the credibility of results as objectives and the modelling process are clearer to clients. Although we ran workshops involving a group of stakeholders in conceptual modelling, we chose not to adopt the full Soft Systems Methodology (SSM) approach advocated in the PartiSim framework in favour of simpler brainstorming and process mapping exercises.
Hypothesis two reflects one of the core rationales for VIS. If system behaviour is communicated to clients as a model runs, either for validation or experimentation, then this is proposed to facilitate trust in the model and results and hence increases the changes that results are implemented [25,38].
The final hypothesis is based on reported benefits of engaging clients in facilitative modelling [22]. Although often associated with problem structuring methods, for example SSM [39] or Strategic Options Decision Analysis [40], facilitative approaches have been shown to be feasible in DES studies [38,41,42], for example SimLean facilitate [42], and Group Model Building in SD [43]. The process of developing a model is often argued to foster the development of a common language or point of reference for stakeholders that improves the quality of debates about action [44].
Stated concisely, our three hypotheses were: • If stakeholders are involved in conceptual modelling then the relevance, transparency and credibility of results to stakeholders are increased.
• If stakeholders can be engaged by VIS in validation and experimentation then the resulting model transparency of working relationships increase the use of results.
• If stakeholders with competing interests are engaged in a modelling project then debate about implementation is structured by the model acting as a common reference point.
This section provides a qualitative evaluation of implementation by comparing these hypotheses to what actually happened in the intervention. Data collection was through the lead modeller's field notes (TM). In instances where the modeller was leading a meeting, for example when validating the model, conversation was recorded and field notes were made afterwards. In more sensitive meetings, i.e. those discussing implementation, conversations were not recorded and notes were made from memory immediately following the meeting.

Hypothesis one: user involvement in conceptual modelling
The acute stroke team (AST), which is comprised of stroke physicians and specialist nurse practitioners, approached the medical school to initiate the intervention and were subsequently heavily involved throughout the study. We also believed that conceptual modelling should involve the ED, as a substantial part of the emergency pathway (and controllable processes) occurred there. We did not involve the ambulance service in conceptual modelling at the outset, but provided them with provisional results once available.

Involving the ED
We engaged the clinical lead of the ED at project initiation, as it was felt that he would represent the views of the wider group of consultants in ED. The preliminary project meeting took place in October 2010 and was used to define clear performance measures and scenarios from the outset. In attendance were a modeller from the medical school, the clinical lead of the ED and the stroke physician who would act as the project liaison between the medical school and the hospital. The ED clinical lead was involved in making two key contributions to the conceptual model: key performance measures would include the proportion of strokes treated and treatment speed from arrival at the hospital; while scenarios should compare the impact of early referral after triage to that of extending the licensed window for thrombolysis from 3 to 4.5 h after onset. The meeting ended with an agreement that the AST would lead the project and report the results of the modelling back to the ED.

Involving the AST
The AST added further model outputs and experimental factors to the conceptual model in the early part of 2011. There were two new key performance measures to include: the increase in prioritised scans (to assess workload changes) and the number of patients with minimal or no disability at 90 days due to treatment. The lead AST clinician felt that the latter measure was less abstract than time to treatment and would be important when communicating the results to the wider clinician audience and the ambulance service. The AST, particularly the specialist nurse practitioners, were also involved in process mapping and model validation workshops. This prompted a further experimental factor to be added to the model: the rate at which paramedics' pre-alert the AST to imminent stroke arrivals.
Implementation of a pre-alert system was thought to be plausible, as ambulance crews already provided pre-alerts to the ED for other emergency conditions. A potential barrier to implementation was thought to be a change of control in the pathway; i.e. a move from the ED to the AST. A plausible solution was to design an information chain: where the ambulance crews pre-alert the ED and then the ED passes the information to the AST. Although some information would risk being lost in such a system, it was believed that the wider group of ED consultants were more likely to agree to implementation on these terms. There was also some concern that prioritised brain scanning might lead to many false positives, typically identified by an ED consultant, consuming resources that would be better served elsewhere. As such, the scenarios were designed to include specialist assessment by an AST nurse prior to brain scanning.

Influence on implementation
The modelling was completed in six months with results reported to the AST and ambulance service shortly afterwards. Although engaged at a relatively late stage, the reaction from the ambulance service was supportive, observing that it was quite rare to get any feedback on what they as paramedics can do to influence patient outcomes. They stated that the outputs from the model allowed them to see what disability impacts might be if they pushed for high pre-alert rates for all FAST test positive cases of suspected stroke-an area where it is typically very difficult to make any difference. Interest was such that the ambulance service were keen to work more closely with the medical school to investigate factors during ambulance callout, pickup and travel that might help increase treatment rates further.
While we were able to quickly disseminate the results to the AST and ambulance service, we were unable to organise a meeting with the ED until five months after the modelling was completed.
The meeting included all consultants from the ED (including the clinical lead) and was led by the AST with the medical school providing support regarding the modelling.
The initial reaction of a subgroup of ED consultants to the modelling results was negative. Discussion of the model assumptions revealed that this group had concerns about the evidence base for thrombolysis and had further concern about the increased proportion of patients suffering SIH leading to higher mortality rates-a measure that was not modelled explicitly (our mRS 0-1 measure implicitly incorporates those patients suffering SIH and going on to have good functional outcome). This was a surprising reaction, given our previous experience with the ED clinical lead and that the most up to date evidence clearly supported the opposite view on the value of thrombolysis [15]. It was apparent that the emergency medicine literature took a different position on acute stroke thrombolysis to the neurology literature. In particular, the re-analysis of an early RCT by an opponent of thrombolysis argued that there was no evidence of a time dependent benefit [45], although this work was subsequently disputed [46].
Our conceptual model, therefore, was missing a key output measure: the proportion of patients suffering early SIH due to treatment. We did however have a simple model of SIH available: we expect that 4%-7% of treated patients will suffer this complication [18]. We had not included this in the model as there was no evidence that SIH was related to onset to treatment times. Nonetheless, this exclusion affected study credibility among the ED clinicians.
With regard to the control of the pathway and screening of false positives before brain scanning, the reaction was more positive. The consensus from the ED clinicians was that implementation would work more smoothly if pre-alerts went directly to the AST. This was due to the high number of pre-alerts the ED already received for a variety of emergencies, and recognition that reducing the information chain meant a more robust solution. Our design for screening for false positives by AST prior to any brain scanning was also acceptable to the ED consultants, recognising the specialist knowledge of the AST specialist nurses.
The outcome of the meeting was positive and swift. The ED clinical lead contacted the AST the following week (December 2011) and confirmed that the ED would go ahead with a change in the pathway. In summary, we found that involvement of stakeholders in conceptual modelling did help with the relevance of findings leading to implementation; however, selection bias meant that stakeholders did not represent the full range of views on change.

Hypothesis two: VIS
We primarily used VIS during model development between March and June 2011. Although this was originally planned to facilitate both validation and experimentation it was mainly used by the lead modeller for the former. Much of the discussion was therefore of benefit to the modeller rather than the stakeholders and does not support the original assumption. For example, a specialist nurse would be shown a simulated patient arriving at different times of the day to illustrate how the model passed information about the patient between clinical groups. The nurse would then help clarify the order of information exchange.
In contrast to VIS the batch run results provoked more lively responses from stakeholders and prompted discussions of implementation. As we have already reported, both the ambulance service and ED were responsive to simple charts illustrating the differences in system performance across scenarios (for very different reasons).
We also used VIS after project completion (July 2011 onwards) to illustrate the simulation approach to other hospitals. As the results of the simulation study were disseminated throughout the regional stroke network the medical school was approached by four further trusts (three hospitals and the ambulance service) with interest in implementing similar simulation projects within their trust. We met with each of these trusts and visually demonstrated the model and scenario comparison. In these cases the use of VIS achieved more engagement and discussion of possible implementation options at the trust. Between December 2011 and January 2013, simulation modelling was conducted with two of these trusts while, due to data availability, statistical analysis and visualisation of patient pathways were used with the remaining two.

Hypothesis three: a common reference point
The clearest example of debate about change was in the final project meeting between the ED clinicians and the AST lead. Both parties had the patient's interest at heart yet both took radically different views on how to manage emergency stroke patients. Here the model results both for treatment rates and patient disability were the most useful. Although some consultants reacted negatively to the proposals, their objections used the model results as a common reference point in their arguments; for example, ''are those disability results real in the pre-alert scenario''. This prompted a discussion of the clinical assumptions included within the model and provided a forum for clinicians with concerns to air their views. This evidence agrees with our initial assumption about the model acting as a common reference point.
The exact influence of this imposed structure on the success of implementation is difficult to discern from a single case study. However, we propose that the process of setting up a meeting with ED and presenting quantified results may have helped change clinicians' attitudes. Some evidence of this can be seen in the comments from two ED clinicians towards the end of the meeting. In particular, they appreciated the time taken to perform the analysis and meet with them for a discussion as opposed to 'the usual approach of specialities e-mailing a demand'. In other words the more collaborative nature of our engagement within their organisation persuaded them to positively consider the changes.

Discussion
This paper makes two contributions to the DES and OR literature. Firstly, evidence of the effectiveness of simulation in improving health systems is provided through our evaluation of the impact of implementing the results of a DES study in emergency stroke care. Secondly, we compare the research team's assumptions about how the intervention was supposed to work to what actually happened to assist planning of OR projects to maximise influence and impact. We report these contributions in this section as follows. Initially we summarise the utility of the simulation by comparing the actual outcomes of implementation to those predicted. Then we critique the three assumptions regarding the use of modelling and implementation set out in the introduction. We close the section with a discussion of the strengths and limitations of our evaluation.

Model utility and validity
Our results clearly demonstrate that improvement occurred in the hospital's management of emergency stroke patients over the timescale of the project. More stroke patients are treated in shorter periods of time, particularly after phase 2 of implementation where the new thrombolysis rate represents a threefold increase in the number of treated stroke patients under the age of 80. These improvements, falling into the middle of the range predicted by our model, are substantially higher than other recent thrombolysis improvement initiatives [47,48]. We also observed a gradual growth in the proportion of patients treated as opposed to a step change, due to a dissemination effect where paramedic pre-alerting rose gradually from around 25%-30% in the latter part of 2012 to 60%-70% in 2013.
Although we had no over 80s population to compare to preimplementation we know that these patients will also benefit from the more responsive process post-implementation. As average treatment time approximately halved we can also expect that more over 80s patients are treated than would have been. Our model predicted that treatment of the over 80s would increase from 7 to 31 patients per year using the paramedic pre-alert system.
We used a conservative approach to confirm improvement by including the period in which the simulation study was conducted within the pre-implementation period. It is noteworthy that the thrombolysis rate observed pre-implementation is skewed by the period in which the simulation study was conducted. Our explanation of the during effect relates to the nature of the change. Implementation relates not only to hospital protocols, but also the relationships that the specialist stroke nurses have with their colleagues in ED and radiology. Anecdotally they reported an improvement in these relationships through their participation in the study. This in effect means that the triage referral system was implemented, to some extent, before the official sign off in the ED.
The utility of the model was affected by our decision not to incorporate the incidence of SIH resulting from thrombolysis. This decision was made with user involvement, although this figure turned out to be influential with the ED clinicians not involved in the study. As expected the actual incidence of SIH pre and post implementation was similar and within expected limits from the evidence base [18]. In terms of the validity of our model, the decision not to explicitly incorporate SIH as an outcome measure is legitimate; however, it did cause credibility issues with a group of ED clinicians and serves as a clear reminder of the difference between validity and credibility [49].

Did the simulation study work as expected?
To effectively evaluate a modelling intervention the underlying assumptions about how the intervention was supposed to work must be surfaced [13]. Table 2 summarises the three assumptions outlined in the introduction to the research against evaluation findings and several propositions refined on the basis of the evaluation.
Our first hypothesis was that involvement of stakeholders in conceptual modelling aids the relevance, transparency and credibility of results and hence chances of implementation. We involved representatives from both the ED and AST at the beginning of the study. However, the views expressed by the ED clinical lead were not fully representative of the larger clinical group, a finding which surfaced substantially later in the project. Therefore we neglected to include a key output (risk of SIH) in our model. The impact of this decision on model credibility demonstrates that involvement in conceptual modelling can indeed be a critical component of implementation. We note, however, that in practice the stakeholder group involved in conceptual modelling has to be limited in size; hence it is difficult to judge the representativeness of the wider stakeholders. Our experience at the other hospitals, where ED was more extensively involved throughout, also suggests that selection bias is in play. The representatives from ED who wished to be involved in the project were those with the most positive views towards the treatment. Although selection bias in the main project group will be difficult to eliminate entirely it is likely that members of the project group will be aware to some extent of colleague's views. Table 2 Hypotheses about the modelling process, findings and post-evaluation propositions.

Hypotheses
Findings Post-evaluation propositions 1. If stakeholders are involved in conceptual modelling (CM) then the relevance, transparency and credibility of results to stakeholders are increased.
1.1 ED were involved in CM right at the beginning; however, it proved difficult to involve a representative group of clinicians • If stakeholders who are representative of the wider organisation are involved in CM then the relevance, transparency and credibility of results to stakeholders are increased. 1.2 Individual ED clinicians not involved in CM were critical of the evidence base for thrombolysis, but once involved in discussion of results were able to reach a consensus that changes should be implemented • If stakeholders who have not been involved in CM are later involved in discussion of results then their focus on process improvement can be developed or reinforced. 1.3 The ambulance service was not involved in CM, but were keen to implement results 2. If stakeholders can be engaged by VIS in validation and experimentation then the resulting model transparency and development of working relationships increase the use of results.
2.1 Little evidence that VIS increased engagement within the project; more useful for validation of model logic.
• If stakeholders are unfamiliar with modelling then using VIS in validation and experimentation improves initial engagement and buy-in; but to improve the use of model outputs (in a clinical setting) requires the presentation of results in a more conventional scientific form. • If a common forum for the mutual exchange of information and experiential knowledge can be developed then the uptake and integration of different forms of knowledge are increased. 3.2 ED clinicians were more willing to consider the results due given the manner in which the study was conducted A simple approach within OR interventions, therefore, is interview stakeholders individually and ask what social or political factors affect performance improvement. This approach is similar to initial stages of SSM, i.e. investigating the social norms and roles within an organisation. This line of questioning may also help identify key individuals whose views are not currently represented by the project group and that could be invited to join.
In contrast to the ED and AST, the ambulance service was not involved at all during conceptual modelling or the larger intervention. However, results were welcomed with enthusiasm. A possible explanation of this is that ambulance services are highly focused on continual improvements in responsiveness. The changes recommended by the study reinforced these organisational objectives and the manner in which the ambulance trust conceptualised problems. Therefore, although involvement in conceptual modelling was not essential in this case, inclusion of the ambulance trust in discussion of results was essential. In the ED several senior clinicians were more focused on the medicine than process. Involvement of these individuals in debate around results was also essential as it provided an opportunity to develop their focus on process improvement. This latter hypothesis also provides a practical workaround for the selection bias issue for the main stakeholder group involved right throughout an intervention.
Our second hypothesis concerned the use of VIS to increase stakeholder engagement. We found little evidence of this within the project. In fact many clinical stakeholders were more interested in the batch run results presented using simple charts. The main use of VIS, therefore, was as part of model validation, i.e. stepping through model logic with the AST. A possible explanation is that the presentation of results in medical science, i.e. tables of summary statistics and measures of accuracy, is more akin to batch run results in simulation than process animations. The more 'scientific' appearance of these results in this context was therefore more engaging for clinicians. Unexpectedly, the model was much more engaging to stakeholders outside of the project. VIS was particularly helpful in engaging clinicians and managers from elsewhere who heard about the work through the local NHS Stroke Network and were keen to replicate it at their hospitals. This less-tangible aspect of dissemination, if not implementation, is perhaps overlooked when discussing the impacts of a simulation study. Thus we propose an alternative formulation of our original hypothesis about VIS: if stakeholders are unfamiliar with modelling then using VIS in validation and experimentation improves initial engagement or buy-in to simulation. Further research may wish to consider if these types of effects are specific to modelling techniques that lend themselves to animation or if this is a more general benefit across modelling.
Our third hypothesis concerns using a model to provide stakeholders with a common reference point to debate change, an idea that is well documented within OR. In our study the model facilitated the discussion of implementation by providing a structure to scrutinise the clinical assumptions that underpinned the results. The validity of these assumptions and objections were debated at length within the ED clinician meeting. The model acted as a reference point for this discussion (for example, ''are those disability results real?'' and ''in your model you prealert the ED, wouldn't it be better to. . . '') and helped progress to be made in reaching a final decision on implementation within the ED, despite the objections of some staff members. Outside of the group concerned about the clinical effectiveness of thrombolysis, engaging the ED team in a collaborative manner that was unusual to them with a model also helped with the credibility of recommendations. Hence an additional proposition for implementation theory may be the idea of a collaborative forum for the mutual exchange of information and experiential knowledge. If this can be developed within a modelling project then the uptake and integration of different forms of knowledge is increased.
In summary, our DES study was able to inform and support a dramatic improvement in both the treatment rate for stroke patients and the speed with which the treatment was delivered, but the modelling process did not always work as expected. In particular, we draw attention to the practical difficulties we faced with gaining a representative understanding of ED's view on the problem and the use of VIS as an initial buy-in tool rather than a device for continual engagement.

Limitations
While other studies of implementation have used qualitative methods only (e.g. [8,50]), a particular strength of this evaluation is the use of both the quantitative evaluation of system efficiency and qualitative critique of the process that led to implementation. However, a weakness is that the qualitative study is based purely on the lead modeller's field notes: providing only a single perspective which may be subject to some bias. A more systematic approach would incorporate an independent researcher to develop and document the evaluation throughout. Such evaluations might make use of systematic methods extracting group decision development process data [51], frameworks for investigating implementation processes such as the Promoting Action on Research Implementation in Health Services (PARIHS) framework [52] and sampling of process measures informed by the principles of Improvement Science [53].
Similarly we note that we framed the evaluation in terms of the research team and the hospital and did not include users that were not directly involved [54], i.e. services users: both carers and patients. We found the tracking of disability measures post stroke unfeasible for this project and even if they were included we would not gain an understanding of the perceived quality of care and outcomes. A possible way to mitigate this in future studies may be the involvement of service user representatives throughout a simulation study and during implementation of its results [55].

Conclusions
Most published case studies of DES in health do not provide any discussion of the implementation of study results [5] or any structured evaluation. In order to provide an evidence base of the impact of simulation modelling in practical applications it is essential to publish accounts of implementation. One way to build this evidence base is for case study authors to provide an evaluation of implementation (or improved understanding) in which the modelling process assisted, which may involve longer timescales than anticipated. To contribute to this growing area, this paper presents an evaluation of implementing the results of a DES study of emergency stroke care. The results demonstrate that the simulation did contribute to improvement, but change was difficult and the intervention did not always match the research team's assumptions about how things would work. This 'not matching' has enabled the theories to be refined and added to in a way that can inform future modelling practice and help DES studies to better address the knowledge deficits that can have such a negative impact on health outcomes. In order to minimise interpretation bias and create generalisable knowledge to improve simulation interventions, more systematic prospective evaluation is needed; this is likely to include specialist knowledge different to that of a typical expert in simulation.