Introduction

The new ICH E9(R1) Addendum on Estimands and Sensitivity Analyses in Clinical Trials was adopted by the ICH in November 2019 [1]. ICH E9 (R1) introduces the ‘estimand framework’ for clinical trials, which aims to ensure clarity in the description of treatment effects. Many authors have since described the concepts, such as Ratitch et al. [2, 3],Keene et al. [4] and Clark et al. [5], who provide tutorial-like descriptions of the new framework together with case study examples.

The implementation of the ICH E9 (R1) Addendum (hereon referred to as the ‘addendum’) requires adaptation of previous ways of working. Typically, the process starts with awareness building and education, followed by implementation and then further exploration of the potential of the framework, as depicted in Fig. 1.

Fig. 1
figure 1

A typical approach to estimand implementation

In October 2019 an Estimand Implementation Working Group (EIWG) sponsored by the European Federation of Statisticians in the Pharmaceutical Industry (EFSPI) and the European Federation of Pharmaceutical Industries and Associations (EFPIA), was initiated to support the implementation of the final addendum through sharing experiences across industry. The aims of the EIWG include (i) sharing recommendations for best practices and learnings (ii) consolidating issues and topics for discussion and (iii) raising awareness of the value of the estimand framework across industry and beyond. The EIWG currently consists of clinicians and statisticians from 20 + pharmaceutical and consultancy companies, as well as members employed by regulatory agencies.

This paper marks two years since the finalization of the addendum and describes the EIWG’s experiences and learnings from implementation of estimands in industry sponsored clinical trials. We begin with a description of our initial views about the opportunities that the addendum presented and the perceived challenges at that time. We then describe our journey of implementation over the last two years, through which we note some key lessons learned and unexpected challenges. Finally, we provide some thoughts for the future.

Two Years Ago, and the Release of the Final Addendum

In October 2019 and just prior to ICH publishing the final addendum, the EIWG held their first workshop to share companies’ experiences of the estimand framework since the draft addendum was released in August 2017. At this point in time, the level of adoption of the estimand framework across companies was very diverse. For example, some had already started awareness campaigns and had run internal training courses, whereas others were only just starting to kick-off discussions. In Sects. 2.1 and 2.2, we describe the opportunities and challenges as discussed during the workshop.

The Perceived Opportunities

At the workshop it was recognized that one of the fundamental purposes of the addendum was to ensure clear communication in the trial protocol of the objectives and targeted treatment effects. Improved transparency at the protocol design stage was expected to increase alignment across stakeholders, particularly between regulators and sponsors, thus leading to more efficient assessment of clinical trial data when submitted as part of a regulatory submission. Historically there was a tendency to design a clinical trial with endpoints aligned to objectives but whereby a plethora of analyses would be performed for each endpoint, which often addressed different unspecified clinical questions. It was expected that the new estimand framework would facilitate a more structured approach to ensuring treatment effects of interest were aligned to specific clinical questions and described by clear clinical objectives. Each treatment effect to be estimated would have an aligned method of analysis and aligned sensitivity analyses which would allow the assessment of robustness and relevance of the assumptions underpinning the estimation methods.

It was also expected that the addendum would impact trial conduct in terms of the required follow-up of patients and also how clinical trials are reported and interpreted. Additionally, the clear specification of an estimand would provide an opportunity for other stakeholders, such as health technology assessment bodies, healthcare practitioners and patients to have greater clarity on the treatment effects being targeted. It would therefore provide a platform for feedback and discussion about what may be relevant from their perspectives.

Notably, the addendum introduced the concept of intercurrent events, defined as “Events occurring after treatment initiation that affect either the interpretation or the existence of the measurements associated with the clinical question of interest” [1]. By requiring that these would be explicitly identified and addressed within trial planning, conduct and analysis, it would require stakeholders to consider their relevance to, and impact on, assessment of the clinical question of interest. As there were now different possible strategies to address intercurrent events, the chosen strategy(ies) would have to be described in the protocol. This leads to clear communication in the trial protocol of the objectives and targeted treatment effects.

The addendum provided the framework and gave the freedom to choose different estimands for a clinical trial, provided that the rationale for the choice was appropriate. Taking this to its logical conclusion, Keene et al. [4] subsequently argued that intention-to-treat (ITT) analyses would not always be the answer for estimating treatment effects in clinical trials. The estimand framework provided an excellent basis for comparison of the objectives, estimands and analyses used in different clinical studies. As such, meta-analyses should be less likely to suffer from combining estimates from different estimands leading to a more consistent and coherent meta-analyses linked to a common target estimand.

One of the most impactful aspects perceived for the estimand framework was the estimand thinking process itself. Although this was not explicitly discussed in the addendum, it was presented in the official ICH E9(R1) training slides under module [6]. The estimand thinking process established a clear order of approach, beginning with the therapeutic setting and intent of treatment to determine the objective. As an important second step, intercurrent events would be identified and strategies for handling them chosen. Subsequently, the estimand with its five attributes could be constructed and the chosen estimand(s) would be documented in the protocol. The thinking process is a tool which was expected to help establish a clear link between the trial objective, estimand, study design and statistical analysis, as discussed by Ratitch et al. [2, 3] and Mallinckrodt et al. [7]. The addendum, similar to ICH E9 [8], had a focus on confirmatory trials, but presented tremendous utility for all clinical trials in terms of the disciplined thinking that it brought to trial design, data collection, analysis, and interpretation.

The Perceived Challenges

ICH E9(R1) was the first addendum to ICH E9 [8] since it was introduced over 20 years ago. However, the principles introduced in the addendum are not solely statistical in nature, but also address clinical aspects of study design, study conduct and reporting study results. As a description of what we want to estimate, estimands can only be defined after definition of clinical trial objectives and consideration of the clinical questions of interest. But as the addendum was a revision to ICH E9, rather than to ICH E8(R1) General Considerations For Clinical Studies [9], there was a danger that its contents would be perceived as statistical in nature. It was recognized that it would be a challenge to change mindsets and ensure all clinical researchers considered specification of estimands as a shared goal. This was one of the main motivations for creation of the EIWG ‘Estimands Academy for Trial Teams’, as discussed in Sect. 3.1.

The estimand framework has enabled different strategies to be chosen to address the intercurrent events identified for a given estimand, and it was recognised this could bring complexity in how best to describe estimands in these situations. In some cases, clinical researchers have decided to label estimands, for example ‘hybrid estimand’, but this has not provided the desired level of clarity to describe the treatment effect of interest. Also, where estimands have employed a diverse range of strategies to address intercurrent events, statisticians were aware it would be more challenging to identify suitable analysis methods to estimate estimands with minimal bias, and also the interpretation of results would become more difficult.

Another recognized challenge at the time the estimand framework was introduced was the meaning of sensitivity analyses and supplementary analyses. It was not clear whether a supplementary analysis has to be aligned to the same estimand or could be aligned to another estimand that is providing further understanding of a treatment effect. The addendum defined sensitivity analyses as “a series of analyses conducted with the intent to explore the robustness of the inferences from the main estimator to deviations from its underlying assumptions and limitations in the data”. An example of a sensitivity analysis would be the use of a tipping point analysis [10] to explore the potential violations of assumptions made in the model with respect to the missing data mechanism used to estimate the primary estimand. In contrast, supplementary analyses were defined as “a general description for analyses that are conducted in addition to the main and sensitivity analysis with the intent to provide additional insights into the understanding of the treatment effect”. For some clinical researchers, ‘of the treatment effect’ was interpreted to relate to the same estimand with no changes to any of the attributes, whilst other clinical researchers interpreted this to require a different, but related, estimand with at least one attribute changed, with its estimation still helping interpret the same treatment effect. One example might be if the primary estimand targets a continuous variable that is summarized by the mean change from baseline in the primary outcome variable between two treatment groups. One might ask a broader question of the clinical relevance of the effect observed and this could be addressed by a responder analysis. Clearly this broader question is strongly related to the initial question but is subtly different. This is an example of a different, but related, supplementary question which addresses the same overall clinical question of efficacy with respect to this endpoint but from a different perspective. In doing so it requires an additional estimand. The training slides recently updated by ICH [6] stated that supplementary analyses should in general be given lower priority relative to sensitivity analyses. EIWG will continue to monitor the approaches used by teams as more experiences are shared.

A major area of controversy in implementing the estimand framework was the lack of alignment with ICH E9 and analysis sets. Under ICH E9, an analysis set was defined, and this would identify the patients to be included. This analysis set was then used as the basis of estimation for multiple endpoints. With ICH E9(R1) each estimand would have its own set of data specifying both the patients to be included and how their observations would be used for estimation aligned to the strategies chosen to address the intercurrent events [6]. There is an ongoing collaboration between the EIWG and PhUSE (The Global Healthcare Data Science Community) to determine if there are any technical aspects or changes to CDISC data standards that may be needed to ensure the set of data used to estimate each estimand is clearly defined in analysis data sets.

Another challenge relating to historical practices for analysis sets was what role, if any, was there for a per protocol analysis set in light of the framework laid out in the addendum? In many clinical trial protocols, a per protocol analysis set would have been defined to assess efficacy of a new treatment in those patients who were able to closely adhere to the protocol. Take the example of a long term study when a patient takes prohibited medication that is known to have a short term benefit at one visit. Inclusion of all data collected for that patient before that intercurrent event, and indeed collected after a suitable period of time after that intercurrent event, would still provide useful information to characterize the benefit of the drug in many circumstances. Hence, excluding randomized patients entirely would generally not be a favored option. The addendum also does not support the use of per protocol analysis sets to estimate treatment effects in subjects who are able to adhere to a treatment: A per protocol analysis does not address the issue that a patient could adhere to one treatment but not an alternative one. This means that firstly, the target patient population is not clearly defined without further clarification, and secondly, however it is defined, a simple analysis on the per protocol set will be biased as the evaluated patients in each arm may not be entirely comparable. This aspect could now be addressed by targeting a principal stratum approach [11,12,13,14,15], although, the number of assumptions needed to estimate such an estimand would not make it an attractive choice in most circumstances. It was recognised the role of per protocol analysis sets would need more attention and discussion.

The addendum discussed how the estimand framework may impact the conduct of a study. The choice of estimand and strategies for addressing intercurrent events could affect the duration of follow-up to ascertain outcomes of interest, strategies to allow changes to existing medications and/or the use of rescue medication, and strategies for retaining subjects in a trial following a decision to stop the investigational product(s). Since the National Research Council report on missing data in 2010 [16] there had already been a great focus on reducing missing data in clinical trials, however in clinical trial practice it still remained a challenge to retain patients in a trial, or at least until the primary outcome of interest had been observed.

Questions were also raised about how intercurrent events related to protocol deviations. In some cases, they overlapped; for example, an important protocol deviation might be an unexpected dose interruption whilst taking the investigational product, and this could also be considered an intercurrent event. The definition, identification and reporting of important protocol deviations should continue, adhering to good clinical practice to minimize their occurrence as the number and extent of important protocol deviations could be a surrogate for whether a trial was conducted to high quality and where trial integrity was maintained.

Finally, Section A.6 in the addendum provided brief information on the type of information that should be provided in the clinical trial report including summaries of the number and timings of each intercurrent event in each treatment group. However, no guidance on how best to do this was provided and this was an example where more advice was needed to support the implementation of the addendum. For situations when potential imbalances in the occurrence of intercurrent events between treatment conditions are likely, for example in Chimera Antigen Receptor T-cell therapies [17], the need for additional analyses (and potentially different estimands becoming of interest) should now be discussed with regulatory agencies, and where possible, agreed at the design stage of the study. See Sect. 3.4 for more discussion on how to incorporate estimands in reporting results of clinical trials.

The Implementation Experience So Far

Many of the larger pharmaceutical companies (and industry partners) started the estimand journey with an awareness campaign to clarify the definition of an estimand, promote the value and benefits of the framework and to highlight the regulatory expectations (particularly for confirmatory studies). This was often followed by more in-depth and formal training that in some cases were targeted for individual line functions and in some others to a cross-functional audience. In order to assist trial teams with implementation of estimand language in protocols, typically protocol template text was developed and the statistical community started discussions about the best methods and approaches for estimation to target causal estimands. Some sponsors are now starting to gain experience with regards to the way that estimand concepts may change reporting of results and are also exploring the utility of the framework as a tool to help establish important estimands which may be beneficial to other stakeholders beyond health authorities (e.g. health technology assessment bodies).

In this section we provide more insights to the aspects of implementation and highlight some recommendations and points to consider from an EIWG perspective.

Awareness and Education

As discussed in Sect. 2.2, one of the key challenges faced by industry sponsors was how to engage with clinical researchers, given that the knowledge base of the estimand framework lay largely within the statistical community. Experience has shown that it is beneficial to first raise awareness of estimand concepts across a broad range of functions including clinical, regulatory affairs, statistics, trial operations, medical writing, statistical programming, etc. to promote initial cross-functional engagement. If cross-functional ‘supporters’ of the estimand concepts can be identified at an early stage, it is then possible to continue the engagement by working as a cross-functional team to develop learning solutions as well as to help with training facilitation, thus proliferating the message that estimands require cross-functional discussion. This is facilitated further by running training sessions for cross-functional audiences.

Another key aspect which can help to facilitate learning is the use of case studies, particularly those discussed with regulators. Creating an archive of real-life case studies together with any regulatory feedback can help to provide compelling material for awareness presentations, training material and also enable to track trends in health authority views. There has been positive experience through use of case studies as part of training programs, the background of the case study can be provided, then teams can be organized in break-out groups to discuss the case, using the estimand thinking process as a way to rationalize the primary estimand. This method of training can become even more powerful if real regulatory feedback can also be shared, thus giving attendees interesting material for debate and discussion. As internal experience is gathered, sharing of case studies through seminars also helps to facilitate continuous learning.

As an additional support to trials teams, many of the companies involved in EIWG have also seen great benefits in setting up support sessions to allow access to subject matter experts on estimand topics in order to debate and pressure test the choice of estimand. Table 1 provides insights with regards to typical questions which you may expect from associates at different stages of learning and also provides some tips on training and awareness based on EIWG experiences.

Table 1 Typical questions raised at each phase of learning and EIWG recommendations

One of the key priorities of the EIWG was to make the estimand framework accessible to all stakeholders involved in clinical trials design or decision-making. A series of webinars based on real-life case study examples have been developed [18,19,20], and are freely available in the public domain through a video-on-demand library called “The Estimands Academy for Trial Teams” (https://psiweb.org/vod/Index/). Future webinars will focus on different disease areas with the objective of providing case study examples to illustrate implementation of the estimand framework.

Implementing the Estimand Framework in Protocols

One of the key challenges in implementing the addendum in clinical trials is how best to describe estimands. As a key part of the scientific content of a trial, they need to be included within the clinical trial protocol (CTP), yet the addendum does not provide guidance on how this should be done. CTPs are usually written based on a template, but currently there is no single harmonized template available. However, some cross-industry/academia examples are available, including the TransCelerate [21] and NIH/FDA [22] templates, and the ICH M11 Working Group [23] is developing for the first time a new standard protocol template. Many sponsors have also developed their own in-house CTP templates and continue to update them relative to regulatory guidance and Industry best practices. To support these activities, the EIWG has a workstream looking at providing guidance on how to incorporate the estimand framework in CTP templates. The EIWG has submitted a publication to share best practices and recommendations.

There are several other hurdles to introducing estimands into protocols. Firstly, estimands impact many aspects of trial design, conduct and analysis. The estimands themselves not only need to be written into the protocol, but their impact on topics such as the choice of trial design, criteria relating to discontinuation of investigational products and/or initiating of rescue medications, and/or follow-up of patients including retaining patients in a trial to collect data beyond treatment also need to be considered. Given the importance of estimands to the scientific rationale of a trial and their many downstream consequences, estimands must therefore be described early in a protocol and not left to the statistical section or an appendix.

Secondly, implementation of the estimand framework changes both the structure and language of protocols, which have traditionally been designed around endpoints. For example, analyses are usually described as being ‘of an endpoint’. The ICH E9(R1) Addendum emphasizes that the endpoint is just one of the components of an estimand. With estimands written into a protocol, the analyses should now relate to either an estimand or its corresponding objective. These structural changes introduce a degree of incompatibility between traditional ‘endpoint-driven’ protocol templates and ‘estimand-driven’ ones. Consequently, many sponsors will have a mixture of clinical trials where some will describe estimands and some that won’t. CTP templates are therefore still required, for a period of transition, to fit both types of trials.

Thirdly, a major challenge is how to handle objectives in protocols. The ICH E9(R1) Addendum discusses the importance of objectives in setting estimands but does not provide guidance on how best to write objectives. Clinical trial objectives can be broad and high level [9], or detailed and specific [24]. More detailed objective specification assists the choice of estimand, yet when writing a protocol containing detailed objectives, estimands and clinical questions of interest may lead to unnecessary repetition. The concept of ‘clinical question of interest’ is also raised by the ICH E9(R1) Addendum, and it is not clear how this materially differs from a specific trial objective or the estimand. However, it is clear that at least the treatment condition, population, endpoint and intercurrent events should be addressed in the clinical question of interest [1].

It may be helpful to name or number estimands to facilitate referencing within the trial protocol. Intuitively, in cases where all intercurrent events are handled with the same strategy, it is natural to use the name of that strategy. However, this naming convention does not work where different intercurrent events are handled with different strategies (so-called “hybrid” estimands, see Sect. 2.2). Care should also be taken when using such names outside the clinical trial (or clinical project), since the same name could represent several different estimands.

Statistical Analysis and Estimation

Following the release of the addendum, most discussions about the new framework have primarily centered on what is being estimated; the estimand(s). However, estimation (including both primary analysis and the new definitions of sensitivity and supplementary analyses) is a key part of the framework. The focus on estimands has led in some cases to major, perhaps unforeseen, consequences for estimation. In particular, problems have arisen mapping existing statistical analyses to estimands [25, 26]. Some commonly used statistical methods, such as Cox Proportional Hazards, have been shown to not fully correspond to any particular estimand [27]. That these difficulties exist is perhaps not surprising given that historically there was often little attempt to define what was being estimated by the statistical methods, and the typical requirements were usually that they followed ITT principles and had good statistical properties (e.g. type I error control, minimal bias, high power). To highlight the issues with adhering to ITT principles, several years before the publication of the addendum Little et al. [28] were able to propose three different estimands for continuous data that all corresponded to the ITT principle (one of which would now be described as treatment policy and two as different hypotheticals). Each of the three estimands required different estimation approaches, and yet many commonly used estimation methods can still not be considered fully aligned with any of them.

As also noted by Little et al. [28] previously labeled intention-to-treat (ITT) analyses for continuous endpoints that excluded data collected after the occurrence of intercurrent events, and used statistical methods such as the Mixed Model for Repeated Measures (MMRM), were more aligned to estimands using hypothetical, rather than treatment policy, strategies to address intercurrent events. In contrast, under treatment policy all data collected before and after intercurrent events should be included in the analysis. However, this alone is generally not sufficient; analyses based on treatment policy strategies should also account for the occurrence of intercurrent events rather than ignore them as they mediate outcomes. In general, missing data handling should also reflect the values that would have been observed had they been measured, i.e. dependent upon the intercurrent events. A consequence of this is that if there is insufficient data available for patients after intercurrent event occurrences then it may not even be feasible to estimate treatment effects using a treatment policy strategy.

There is unfortunately some confusion here, due at least in part to the ICH E9(R1) Addendum contradicting itself; in Section A.3.2 under treatment policy strategy it states firstly that “the occurrence of the intercurrent event is considered irrelevant in defining the treatment effect of interest”, but in the next paragraph that “the intercurrent event is considered to be part of the treatments being compared”. Since treatments should be assumed to, at least potentially, be causal, intercurrent events clearly cannot be irrelevant or ignored. The belief that intercurrent events are irrelevant for treatment policy likely comes from the use of all observed data, i.e. irrespective of whether an intercurrent event had occurred.

In short, ‘treatment policy’ is not interchangeable with ‘ITT’, and clinical researchers need to recognize that many historical ‘ITT analyses’ do not align with analyses based on treatment policy strategies. Consequently, there is little statistical literature concerning the unbiased estimation of estimands using the treatment policy strategy, and this is an area requiring significantly more focus and attention. If complete data are available where patients have been followed in the trial and the outcomes of interest can be ascertained, using the treatment policy strategy for intercurrent events leads to all the data being included as observed and standard analysis approaches are appropriate. However, in the almost inevitable case of there being non-trivial amounts of loss of follow-up, standard analysis methods and their assumptions may be inappropriate, leading to biased estimates of treatment effects for reasons described below. In many trials, missing data are strongly correlated to intercurrent events for two reasons: Firstly, because the occurrence of intercurrent events may cause missingness; patients are typically much more likely to leave a trial if they stop taking randomized treatment. Secondly, access to randomized treatment is usually dependent upon patients remaining in the trial or even attending visits (where endpoint assessments are made), so loss to follow-up and withdrawal of consent cause both missingness and treatment discontinuation where applicable. Missing data for patients that remain on randomized treatments is typically minimal. Therefore, many trials have a partitioning whereby the observed data are predominantly ‘on-treatment’, while the missing data are predominantly ‘off-treatment’. In such cases, observed data are not representative of unobserved data. Since intercurrent events also have to be assumed to be causal for outcome, this creates a strong Missing Not At Random (MNAR) effect, which will cause, often considerable, estimation bias. Note, for patients who die in a clinical trial, measurements are impossible after death and this is not considered a missing data problem in the addendum.

The MNAR effect may be greatly mitigated by analysis methods that condition on the occurrence of the intercurrent event, converting much of the MNAR issue to Missing At Random (MAR) [29]. However, these methods rely upon measuring sufficient data after intercurrent events to be able to use it to reasonably impute the missing data. Such methods have been informally referred to as ‘retrieved dropout’ approaches. Unfortunately, such approaches remain poorly studied in the literature, with only a few, recent, publications on the topic [30,31,32]. Much work therefore remains to be done to describe the characteristics, strengths, and weaknesses of the various methods to do this. One key observation is these methods require considerable recovery of post-intercurrent event data and are generally quite sensitive to how much this is achieved. Although most attention so far has focused on multiple imputation techniques, it should be noted that maximum likelihood approaches should be equally possible.

The other main proposed approach to estimate treatment policy estimands is to use information from the control arm to impute the data being replaced in all arms (e.g. jump to reference, copy reference, etc.) [33]. However, these are flawed in that their estimation of treatment effect in these patients is not data driven i.e. the treatment effect itself is partially assumed instead. There is also little literature available to support the strong underlying clinical assumptions in any setting. In practice, this all means these approaches are necessarily, and often quite heavily, biased. The EIWG is conducting research to compare different modeling approaches to estimate estimands incorporating treatment policy strategies for intercurrent events in continuous endpoints and will be providing recommendations on appropriate approaches based on the simulation results.

Of the other estimand strategies, principle stratum estimation remains as challenging as initially expected given the very little experience of these methods, but there have been a few recent published examples sharing successful implementation [11,12,13,14]. While-on-treatment estimands are typically straightforward to estimate so long as their (often strong) assumptions around, e.g. constant rates over time (of events or change), remain appropriate. The main estimation issue arising recently is around appropriate patient weighting; whether to jointly model outcomes with intercurrent event occurrences, weight estimation by inverse variance (which can bias estimates towards patients with longer follow-up), or equally by patient (which can inflate variance due to differential amounts of information per patient) [34, 35].

Hypothetical estimands remain by far the best-understood from the perspective of estimation, despite regulatory concerns that they are vulnerable to MNAR, which is exacerbated when there are many intercurrent events leading to exclusion of data. This is because their clinical assumptions align well with simple statistical models: Whereas estimation of treatment policy must consider the occurrence of intercurrent events, hypothetical estimation can simply assume they do not occur. Though the issues around MNAR remain outstanding, and always will, they are well-known and described in literature, and can typically be appropriately addressed through a variety of sensitivity analyses [10]. However, as highlighted in the ICH E9(R1) Addendum it is important to ensure the analysis of estimands using a hypothetical strategy is aligned to the specific hypothetical scenario under which the intercurrent event would not have occurred, as this will influence the choice of imputation. As anticipated, the use of composite strategy is more straightforward, but it is also an area for ongoing research in the case of continuous endpoints, for example Darken et al. [36]

Communicating Estimands and Trial Results

As discussed in Sect. 2, a key benefit of the estimand framework is to increase transparency about the meaning of treatment effects being estimated in clinical trials. The PIONEER 1 study [37] is an example whereby two different estimates of two different treatment effects (estimands) were presented, but with a common primary endpoint. This new way of thinking impacted the press release, the primary manuscript, the Clinical Study Report, submission documents and prescribing information. This example demonstrates the variety of documents impacted and the number of stakeholders who must be considered for communication purposes. Many studies with estimands specified in protocols are now starting to report results, and many sponsors are struggling how best to present estimands with the trial results. For very complex estimands, there is a balance that needs to be struck in being sufficiently precise and the results being understood. To tackle these challenges, an EIWG sub team has been formed to focus on transparency and reporting of trials with estimands. This team has identified several areas of focus as described below.

Given the importance of intercurrent events as part of the estimand definition, one key recommendation based on EIWG discussions, is to provide specific displays in the clinical study report, such as summary tables and/or graphical visualizations that provide a quick understanding of the number of intercurrent events and the diversity of patient journeys within the trial. This is something relatively easy to implement, but can have a big impact in terms of the interpretation of study results, particularly in the case whereby the treatment policy strategy has been implemented and where intercurrent events, such as use of rescue medication, become part of the treatment conditions being compared.

Now that many sponsors are defining estimands up-front in protocols, it would be a natural step to allow descriptions of estimands to be included in trial registry databases. In May 2021, a search was carried out for Phase III clinical trials which mentioned the word ‘estimand’ in ClincalTrials.gov. The search revealed 9 hits and in 8 of the 9 studies, the word estimand appeared under ‘outcome measure’. A similar search was carried out at the same time with the EU Clinical Trials Register, in this case there were 11 hits. Full descriptions of the estimand appeared under ‘objectives’ for one study, under ‘endpoints’ for another. Other studies referred to an estimand under the endpoint definition, but without writing out in full what was the treatment effect of interest. These searches have provided some insights in the future of the value of a standard reporting mechanism to reflect the estimands in clinical trial registries to facilitate full transparency expected for clinical trials. A standard approach could also assist to harmonize the use of the terminology across studies. The EIWG are in the process of establishing connections to the National Institutes of Health, to raise awareness and discuss options to provide further guidance in reporting estimands.

The publishing of trial results in medical journals is another recognized challenge, where familiarity of the estimand terminology in the broader scientific community is currently limited. In recognition that the estimand framework is in tune with the philosophy of the CONSORT 2010 statement [38], the EIWG believes that there would be great value in reflecting estimands in future updates of the CONSORT guidelines. This could facilitate transparency of clinical trial results and harmonize reporting of clinical trials in the literature. For example, the CONSORT flow diagram could be easily extended to incorporate other events in the patient journey, such as intercurrent events. The EIWG have established contact with the CONSORT group and plan to develop recommendations.

Finally, in order to fully evaluate the benefit/risk of a new medicine, it will become critical to communicate estimands in the clinical overview section of the submission dossier and as part of the considerations under the structured benefit-risk framework. Ratitch et al. [2] describes some points to consider with respect to efficacy estimands incorporating intercurrent events which may reflect tolerability of treatment or key safety considerations as this may lead to ‘double-counting the risks or incoherent conclusions’ with respect to benefit-risk evaluation. The EIWG will continue to monitor developments in this area as more experience is shared across companies.

Utility of the Framework During the COVID-19 Pandemic

The estimand framework has helped clinical researchers to assess the impact of the COVID-19 pandemic to ongoing clinical trials and proactively identify strategies to mitigate potential risks to trial conduct and planned analyses. Through the recommendations provided by Meyer et al. [39], clinical researchers were able to: evaluate if any changes were needed to pre-specified treatment effects of interest and estimand descriptions, for example introducing different strategies for addressing intercurrent events related to the pandemic compared to intercurrent events not related to the pandemic; review the planned analyses to determine if additional analyses would be needed to explore the impact of the pandemic to recruitment of patients, study conduct and data collection; and discuss any changes with regulatory agencies and obtain agreement. The estimand framework enabled clinical researchers to align quickly on any changes to study conduct and data collection to ensure trial and data integrity could be maintained, resulting with many ongoing clinical trials being able to continue to their planned completion and enabling the original trial objectives to be addressed. Lancker et al. [40] discuss hypothetical estimand strategies and provide a review of various causal inference and missing data methods, which may be needed to accommodate changes to estimands and methods for estimation to account for pandemic disruptions.

Not only did the estimand framework enable ongoing and planned new clinical trials to retain their trial and data integrity as the pandemic unfolded, it also led to an acceleration in the adoption of the estimand framework in Industry sponsored clinical trials soon after it was released as final guidance. In addition, new regulatory guidance was introduced in response to the pandemic [41, 42] specifically asking clinical researchers to assess whether objectives defined in clinical trials were appropriate, if there was any impact to pre-specified estimands and to plan for additional analyses to aid the interpretation of trials impacted by the pandemic. This led many sponsors to engage with regulatory agencies to align on proposed changes to pre-specified estimands for clinical trials impacted by the pandemic.

Implementation Survey

In March 2021 a survey was conducted by the EIWG group to obtain feedback on experiences of implementing the addendum. Out of the 577 responders the majority were aware (80%) and had received training (67%) on the ICH E9(R1) Addendum. Roughly half of the respondents were statisticians and 20% were clinicians or medical leaders, with most of the experience with estimands in late phase development. This suggests statisticians have taken a lead role in the implementation of the estimand framework and the focus for implementation has been in confirmatory trials. Approximately half of those who had experience of the estimand framework had interacted with regulatory agencies, and in these regulatory interactions over half (59%) indicated they had proposed primary estimands using strategies other than treatment policy for addressing intercurrent events. This is an interesting finding as it indicates that clinical researchers are seeing a need for treatment effects of interest that are different to those used previously which were intended to be aligned to ITT principles. The full results of the ICH E9(R1) survey will be made available in a separate publication.

Continuing the Estimand Journey

Whilst progress has been made in implementing the estimand framework into clinical development, there is some way to go until the full potential of the estimand framework is realized in day to day practice. We need to go beyond incorporating estimands in protocols and ensure the treatment effects are clearly described in relation to results presented in clinical study reports and at scientific conferences. Estimands also need to be incorporated into publications and clinical trial registries such as EUDRACT and CT.gov. To achieve this goal, more attention and focus is needed to ensure a broader range of stakeholders including investigators, ethic committees and academic centers involved in clinical research achieve a good understanding of the addendum. For this to occur, it’s crucial to develop a more intuitive “estimand language” and one of the focus areas in the EIWG will be how to clearly describe the estimand framework in a non-technical way. Further discussion in the EIWG on naming conventions that are emerging (as discussed in Sect. 3.2) would also be useful.

To bring the framework out of the statistics corner, the detailed clinical objectives approach [24] described in Sect. 3.2 could act as a role model and facilitate discussions with key stakeholders such as clinicians, ethic committees, investigators, and patients on estimands of interest aligned to trial objectives. The EIWG is committed to further support this journey by continuing to provide targeted training, presentations at conferences, and publications in journals.

There are a number of important estimand applications and methodological estimation topics which need further research to enable the estimands framework to develop its full potential. Examples include estimands for safety [43], causal inference for estimands using principal stratum strategies to address intercurrent events [11,12,13,14,15], estimands in bioequivalence [44] and non-inferiority trials [45], and approaches for handling missing data for estimands using treatment policy strategies to address intercurrent events that introduce minimal bias [30,31,32]. The addendum has introduced a clear distinction between intercurrent events and missing data. This is important as the previous practice of setting data that was observed to missing and then referring to this data as a missing data problem should no longer occur. Instead, only data that is truly missing will be called as such and assumptions and imputation rules to address potential limitations of missing data will be more transparent. The addendum has also emphasized the need to explicitly state the underlying assumptions of the proposed estimation methods, where their impact on the results is evaluated through targeted sensitivity analyses aligned to the new definition.

From a clinical trial operational point of view there is a need to continue to focus aligning data collection with the estimand and the strategies chosen for addressing intercurrent events. For example, to assess progression free survival time applying the treatment policy strategy for the intercurrent event of starting another anti-cancer therapy, the imaging assessments should continue to the end of the study. Despite many publications in the last decade on the importance of minimizing missing data in clinical trials there needs to be a continued focus on training investigators, clinical trial researchers at sites and patients on what patient follow-up and data are critical to address the trial objectives to enable the treatment effects of interest to be estimated.

As noted by Mitroiu et al. [46] in their review of EMA guidelines across four therapeutic areas, the regulators, including clinical assessors, statisticians, and other experts, are continuing on their journey in incorporating estimand thinking in clinical guidelines. It is anticipated more clinical guidelines in the future will explicitly refer to estimands and this will greatly facilitate the implementation of the addendum in more disease settings. It is important to remember that estimands will be specified to accommodate the needs of different stakeholders (e.g. regulators, health technology assessment bodies and patients). Therefore, as noted in the addendum, it is vital that sponsors meet with relevant stakeholders during the clinical trial planning stage to ensure there is alignment on the treatment effects to be estimated and on the methods for estimation. Prior to the addendum, different preferences across different stakeholders (e.g. regulators versus health technology assessors) or even across same type of stakeholders (e.g. FDA versus EMA) regarding treatment effects and/or methods for estimation were not uncommon. Unfortunately, the addendum will not solve this issue, but it allows all stakeholders to have upfront discussions using a common framework to better identify the differences of the approaches under reflection.

The number of scientific publications emerging on estimands is increasing year on year, with a recent PubMed search for ‘estimand’ conducted in October 2021 giving 330 hits. This is not unexpected given the duration of clinical studies can span many years, but it highlights full implementation of the addendum in the design, conduct, analysis and reporting of clinical trials will become common practice. For this to be achieved, it’s essential to continue to share more case study examples, including how best to effectively and efficiently incorporate the estimand framework throughout the clinical trial process.

Last but not least a key element of the implementation journey of the estimand framework is continuing to foster evidence-based thinking. This involves first stating the specific purpose of the trial, leading to clear trial objectives, and then defining the estimand with aligned trial design that will enable the purpose to be addressed. Before the estimand framework, clinical researchers often started with an endpoint and derived the objective from the endpoint, which led to the mindset “the study met its endpoint” rather than “the study was able to answer it’s clinical question of interest” mindset. Whilst the addendum has a focus on confirmatory trials, continuing the journey will further broaden the application of the estimand framework to all phases of clinical development, for example early phase trials such as clinical pharmacology studies, and non-interventional studies such as assessing the effectiveness of a treatment in clinical practice.

Conclusions

The last 2 years of implementing the addendum have highlighted the value and power of ‘estimand thinking’, but the estimand language and the new terminology introduced have been challenging for clinical researchers to understand. With the estimand framework, the focus is no longer about ITT analysis but clearly defining treatment effects of interest. This requires cross-functional input and alignment and it is not a purely statistical analysis problem to solve. Using case studies in training sessions has helped to illustrate how to implement the new estimand framework. Understanding which data are critical for particular estimands is important to ensure clinical trial protocols clarify what follow-up of patients is needed, including strategies for minimizing missing data, so treatment effects can be estimated with minimal bias allowing trial objectives to be answered. The new definition of sensitivity analysis has ensured these analyses are now aligned to each estimand.

It will be a long journey before all stakeholders involved in clinical research fully understand the estimand framework and are able to implement it broadly. But so much has been learned and shared based on the experiences gained on the journey so far. Early and proactive engagement across key stakeholders is leading to earlier alignment on the treatment effects of interest and methods of estimation including appropriate sensitivity analyses. Table 2 provides a summary of key EIWG recommendations to continue the journey for implementing the addendum and to realize the full value of the estimand framework based on the experiences learned thus far.

Table 2 EIWG recommendations for Implementing the addendum

In conclusion, there is a Japanese saying [47] “He who would go a hundred miles should consider ninety-nine as halfway”. So, it is here that a long breath is needed and there is more important work to be done to support broader implementation of the addendum.