What type of cluster randomized trial for which setting?

The cluster randomized trial allows a randomized evaluation when it is either not possible to randomize the individual or randomizing individuals would put the trial at high risk of contamination across treatment arms. There are many variations of the cluster randomized design, including the parallel design with or without baseline measures, the cluster randomized cross-over design, the stepped-wedge cluster randomized design, and more recently-developed variants such as the batched stepped-wedge design and the staircase design. Once it has been clearly established that there is a need for cluster randomization, one ever important question is which form the cluster design should take. If a design in which time is split into multiple trial periods is to be adopted (e.g. as in a stepped-wedge), researchers must decide whether the same participants should be measured in multiple trial periods (cohort sampling); or if different participants should be measured in each period (continual recruitment or cross-sectional sampling). Here we outline the different possible options and weigh up the pros and cons of the different design choices, which revolve around statistical ef ﬁ ciency, study logistics and the assumptions required.


Background
Cluster randomized trials randomize entire clusters of participants to intervention conditions.This is in contrast to individually randomized trials, where individuals themselves are randomized.In cluster randomized trials, clusters could be hospitals, units or wards within hospitals, entire geographical regions, schools, families, etc. Careful consideration and justification are required before opting for the use of cluster randomization as it increases the required sample size, increases risk of bias and has specific ethical considerations [1,2].Cluster randomization is necessary when the intervention is applied at the cluster level, but outcomes are measured on the individuals within the cluster, or when individual randomization would lead to unacceptable levels of contamination (i.e. the treatment applied to one individual would influence the treatment or outcome of another individual) [1].
In the standard two-arm cluster randomized design clusters are randomized in a 1:1 allocation ratio to either intervention or control condition [Fig.1a].In Fig. 1 the clusters are represented as being allocated to "sequences" which are more commonly referred to as "arms" in parallel designs.There are many different variations of cluster randomized trials, many of which divide the trial up into multiple periods [3,4].Some of these variations are similar to the sorts of variations available when using individual randomization, while others are unique to cluster randomization.The parallel cluster trial with a baseline period is perhaps the simplest of the variations [Fig.1b].Similar to the individually-randomized trial, this design simply requires a baseline measure of the outcome from participants before randomization, in addition to the measure taken after randomization, although typically in cluster trials different participants are measured in the two periods (see below).Another variation is to switch between control and intervention conditions potentially multiple times over multiple time periods (often referred to as sequences of treaments, hence the use of the term sequence) [Fig.1c].Such a design has an individually-randomized analogue in individual-level crossover trials, where individual patients cross between different treatments, with randomization to the sequence (order) of treatments received.In the case of randomizing clusters this is known as the cluster randomized crossover design (sometimes referred to by the acronym CRXO) − with clusters rather than participants crossing between intervention conditions.Similar to individually randomized designs, this might simply consist of a single switch between treatment and control; or randomization to a sequence of switches − so that at set times clusters transition between receiving the intervention or control condition [5].In the stepped-wedge cluster randomized trial (SW-CRT), clusters are randomized to a sequence which dictates the period of transition from control to intervention condition − again a design in which time is split into a number of periods − but where every cluster ultimately receives the intervention condition [Fig.1d].Variations of the SW-CRT include the batched stepped-wedge design (a series of SW-CRTs run in batches), and the staircase design [6,7].
Citation analyses show an increasing adoption of these variations to the parallel CRT [8,9].However, careful consideration is needed when considering which of these variations of cluster randomized designs to choose.Important considerations include the required sample size (number of clusters, number of time periods, number of participants per cluster) − often referred to as statistical efficiency; how many clusters are exposed to the intervention condition; as well as risk of bias (including carryover effects or within cluster contamination, identification and recruitment biases, and biases due to secular trends); and whether to measure different or the same participants in each of the different time periods.Our objective here is to provide some guidance on how to choose between these different designs.

Choices around different sampling structures
If a design in which time is split into multiple trial periods is to be adopted, the same or different participants might be measured in the different trial periods.If clusters are large and data collection induces costs then researchers also need to decide whether to measure all individuals in clusters or just a random sample in each.Where the same participants are measured in each trial period this is known as closed cohort sampling.Alternative sampling structures where participants provide only one measurement throughout the entire trial are possible.A cross-sectional sampling structure would occur when samples of different participants are taken from each cluster in each period.A continuous recruitment structure involves different participants entering the trial during each period of time [10].Alternatively, in some situations some participants may provide measurements in more than one period, and others only once, known as an open cohort sampling structure [11,12].
For some trials the setting determines how participants will be measured.For example, in the LUSTRUM trial participants enter the trial on diagnosis of chlamydia at a clinic (a trial cluster) and their participation in the trial ends soon after [13].In this setting recruitment is inevitably continuous and, unless an individual is diagnosed more than once, different participants will be measured in different periods.In other settings researchers will however need to make a choice of sampling structure.For example, in a school trial with measurements taken initially in year 7, options include follow up one year later by measuring the same children (now year 8) or different children in the same school year (new children, year 7).Here we may prefer to measure the same children again since this allows us to directly measure within-participant change and could increase power to estimate intervention effects.On the other hand, this sampling structure may be more susceptible to bias if, for example, the repeated assessment itself induces a change in behavior, or representativeness declines due to attrition (children leaving the school between school years) [14].
In other settings researchers will however need to make a choice between a cohort or cross-sectional sampling approach.For example, suppose we wish to evaluate an intervention through a baseline and an endline survey.If entire clusters are exposed to the intervention and clusters are open cohorts (e.g., communities) with a gradual rate of people leaving and joining clusters (i.e., 'churn') then the two sampling approaches target two slightly different estimands: the cohort approach targets those exposed from the start (but is subject to dropout from people moving away) whilst the cross-sectional approach targets the broader cluster effect (i.e., the 'culture change') with a mixture of individual exposure durations.In settings where the 'churn' is low then the two estimands will be similar, and the choice of sampling approach is often based on logistical issues or statistical efficiency.

When to use the cluster randomized cross-over (CRXO)?
We start by considering when to use the cluster randomized cross-over design.This design features first in our list because it can be very statistically efficient − by incorporating cross-overs the CRXO design allows each cluster to act as its own control, and resultingly can recuperate some of the losses in efficiency brought about by cluster randomization.Indeed, if correlations between outcomes is expected to decay smoothly over time, the it has been shown that increasing the number of crossovers while maintaining the trial duration can increase the efficiency of the CRXO design [5].However, the CRXO design is often not a reasonable choice when for example the intervention involves any change in human behavior such as education [15].In addition, in settings where the intervention cannot be completely removed, the cross-over design can put the study at risk of carryover effects (that is, the control observations can become exposed to the intervention condition).Although steps can be put into place to mitigate the biases which arise from carryover effects− such as washout periods − these can both increase the duration and complexity of the study and leave doubts about whether the intervention truly brought about any observed change.Rather, the CRXO design should only be used when the intervention can be removed after roll-out and the cluster can return to its pre-trial state.
The PEPTIC trial is an example of a cluster randomized cross-over trial, including 50 clusters (including >26,000 participants) switching between two treatments in common use for stress ulcer prophylaxis in intensive care patients receiving invasive mechanical ventilation [16].The crossover design was considered appropriate because the switching between interventions was straightforward and different patients in each period mitigated against any possible carryover effects from the treatment administered first.

When to use a design with baseline measures?
When cross-over is not possible or would likely put the study at risk of bias due to carryover effects, uni-directional cross-over designs can also increase statistical efficiency over the simple parallel design.The CRT with baseline design is particularly useful when the outcomes are ascertained from routinely collected data (and so have limited cost or delay implications).Again, the magnitude of the increase in statistical precision depends on within cluster correlations, strengths of correlations between cluster-periods and cluster size [17].It is also worthy to note that the cluster trial with a baseline measure will never be as statistically efficient as the CRXO design.An example of a CRT with a baseline period that uses cohort sampling is a CRT in which 10 clusters were randomized to either an unconditional or conditional cash transfer program to determine the effects on child health − here the same participants were measured before and after randomization [18].An example of a CRT with a baseline period which uses cross-sectional sampling, is where schools were randomly allocated to control or a school-based mindfulness program, to determine the effect on well-being − here different children were measured pre and post randomization [19].

When to use the SW-CRT?
The stepped-wedge CRT can also be a pragmatic and statistically efficient design choice [20,21].In the standard stepped-wedge CRT, all clusters must start the trial at the same time, and all must follow the pre-specified (randomized) intervention roll-out schedule.The knowledge that all clusters will eventually receive the intervention can enhance stakeholder engagement and participation in a SW-CRT.However, this appeal must be balanced against the knowledge that at the time of implementing the study it will be unknown if the intervention is effective [22].The SW-CRT can also have an appeal where rationing or staggering the roll-out of the intervention is needed, for example because there are insufficient resources to roll-out the intervention to all clusters simultaneously (but this can achieve this under parallel CRT [4]).
This design has had a large uptake in recent years, but given the difficulties that occur with adherence to the strict scheduling of this design, it is often not the most appropriate design choice [23].In addition, because it is not possible to increase the duration of the study once it has started, the design can also be at risk of underrecruitment issues [24].Further, the stepped-wedge design induces confounding between the intervention and time .This is because the observations taken under the control condition are collected systematically earlier in calendar time compared to those under the intervention condition [Fig.1d].To mitigate this, it is necessary to adjust for time effects at the analysis stage − but these adjustments require assumptions (which cannot be verified) such as the effect of time is linear, or that the effect of time is same across all clusters.
So, whilst statistical efficiency can appear to increase under the SW-CRT design (again, the increase depends on strength of correlation, size of clusters) [25], and it can have a certain amount of pragmatic appeal, it is a design that should be adopted with caution and recognition of the assumptions made about time trends.Consider the SW-CRT across 121 hospitals in 10 different countries, evaluating a bundle of interventions for acute cerebral hemorrhage, including observations from more than 10,000 participants [26].In this trial standard adjustment for calendar time was implemented in analysis, making the assumption that secular trends in the outcome functional recovery were identical across all 10 countries.An assumption of a common secular trend across 10 countries might be questionable.If this assumption was not tenable the resulting effect estimate might be biased.Including homogenous clusters, such as only clusters from within the same country, where calendar time effects (i.e.secular trends) are more likely to be similar would likely make assumptions about common trends more tenable [27].
While in the standard stepped-wedge design, all of the clusters must be ready to initiate the study at the same point in time, the batched stepped-wedge design relaxes this requirement to a degree because the trial is conducted in "batches" of clusters each in a 'mini' stepped-wedge design [6,23].Further, the standard stepped-wedge design requires that all clusters participate in the trial for the entire trial duration; alternatives that do not require this include the dogleg design and the staircase design [28,7].

Fig. 1 .
Fig. 1.A schematic representation of different types of cluster randomized trials.