Current practice in the measurement and interpretation of intervention adherence in randomised controlled trials: A systematic review

Background: Ideally all participants in a randomised controlled trial (RCT) should fully receive their allocated intervention; however, this rarely occurs in practice. Intervention adherence affects Type II error so influences the interpretation of trial results and subsequent implementation. We aimed to describe current practice in the definition, measurement


Background
While randomised controlled trials (RCTs) remain the optimal study design for evaluating the effectiveness of an intervention; research to understand and improve the design, delivery, and analysis of RCTs is required in tandem [1]. This includes the concept of adherence, or the extent to which participants receive their allocated intervention as intended. Deviations from intended interventions can increase the risk of Type II errors, incorrectly failing to reject the null hypothesis [2], and therefore impact on the interpretation of trial results and subsequently, implementation decisions.
The Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT) guidelines recommend that trials should plan and implement robust processes for monitoring adherence and describe these in the protocol [3]. This may include the measurement of adherence and how this will be collected, whether there is a defined acceptable minimum adherence level, and a rationale for these decisions.
However, defining and quantifying intervention adherence within the context of individual trials can be challenging. Within pharmacological trials, there are standardised published guidelines to enhance the quality of measuring and reporting adherence [4], e.g. the Medication Adherence Reporting Guideline (EMERGE) [5]. However, the same cannot be said for non-pharmacological interventions. There are multiple, and sometimes conflicting definitions of the term 'adherence' within the literature but one of the more widely used is the WHO [6] definition of "the extent to which a person's behaviourtaking medication, following a diet, and/or executing lifestyle changes with agreed recommendations from a health care provider". The Consolidated Standards of Reporting Trials (CONSORT) statement, Consensus on Exercise Reporting Template (CERT) and Template for Intervention Description and Replication (TIDieR) checklists outline the need for RCTs of non-pharmacological treatments to report detailed information about the intervention including adherence both at participant and care provider levels [7][8][9]. Whilst it is not uncommon to see the concept of adherence integrated into the reporting of RCTs, it appears the measures of adherence are highly variable and that there is no consensus on best practice. A recent review demonstrated that, particularly in trials of complex interventions, measurements of adherence were highly variable in both quality and content [10]. The literature is plagued with inconsistencies in the terminology used and varied definitions of adherence [11], e.g. use of the terms compliance, fidelity, engagement, etc., each with slightly differing connotations.
Current literature recommends that a sensitivity statistical analysis should be performed for RCTs, in addition to intention to treat (ITT), which aims to estimate the treatment effect among participants considered to have good intervention adherence, especially in the presence of a high rate of treatment non-adherence [12]. For example, a per-protocol analysis (only including participants that have not deviated from the protocol); however, this reduces the sample size, and hence power, of the analysis and can introduce selection bias if the remaining participants are no longer balanced across the groups. Similar concerns can arise from an as-treated (or on-treatment) analysis, in which the participants are analysed according to the treatment [13]. Therefore, outcomes derived from per-protocol and on-treatment analyses should be interpreted with care.
Complier average causal effect (CACE) analysis is less likely to provide a biased estimate of the potential intervention effect as it is randomisation-respecting [14]. CACE analysis provides estimates by comparing participants of the intervention group who adhered to the treatment with participants of the control group who would have adhered to the treatment had it been offered.
No matter the method used, in order to understand the results, it is fundamental that intervention adherence is well defined and reported. This review aims to describe current practice related to the definition, measurement, and reporting of adherence in non-pharmacological RCTs, and how this information is incorporated into the analysis and interpretation of trial results.

Searches
We searched all phase III RCTs published between 1st January 2018 and 30th June 2020 in the National Institute for Health Research (NIHR) Journal Library database for the Health Technology Assessment (HTA), Programme Grants for Applied Research (PGfAR), and Public Health Research (PHR) funding streams. We focussed on the NIHR as they are the UK's largest funder of health and care research. Our review focused on the peer-reviewed reports that are a requirement upon completion of the grant [15]. We chose to focus on the reports published in the NIHR journals library, rather than journal articles, given the comprehensive methodological information that is provided. We felt the details required regarding the measurement and statistical handling of adherence, were more likely to be located in the reports. All reports are written in English, and no search terms were needed as a manual search of all published reports, within our defined period, was feasible given the size of the database. The time interval (2.5 years) was considered adequate to capture current practice in trials of health interventions.

Study inclusion and exclusion criteria
RCTs were included if they used individual or cluster randomisation, with no restrictions on the number of trial arms or on trial populations. We included both superiority and non-inferiority trials, either with a parallel-group or a cross-over design, in which the intervention under observation could be of any type but not an Investigational Medicinal Product (IMP). We excluded all clinical trials of IMPs, typically phase IV, by screening the reports against the Medicines and Healthcare products Regulatory Agency algorithm and/or by searching on the European Union Drug Regulating Authorities Clinical Trials (EudraCT) Database number. Trials solely reporting on the economic evaluation of an intervention were also excluded, as well as follow-up studies of an RCT. Two authors screened all reports' titles and abstracts against the inclusion and exclusion criteria with any disagreements resolved by a third author.

Study quality assessment
Formal assessment of study quality is important in systematic reviews seeking to determine the effectiveness of an intervention, but this review sought to describe current practice regarding adherence in clinical trials with heterogeneous interventions. As such we did not assess trial quality as this would not impact our findings or their interpretation.

Data extraction strategy
One author extracted data, using a piloted Microsoft Excel extraction sheet, information on study characteristics (subject area, study design, setting, description of the intervention, number of participants randomised, primary outcome measures), statistical analyses and results (primary analysis, and other analysis when used) for all the included studies. All included reports were then reviewed by a second author who checked extracted data for accuracy and completeness with disagreements resolved through consensus with a third author. Data were also collected regarding intervention adherence and rationale, terminology used, how this was measured, and whether it was considered when analysing and interpreting the results. Subject areas were subjectively defined based on the primary nature of the intervention from: behaviour change, surgical, non-investigational medical device, physiotherapy/ rehabilitation, psychological change, or other (training/monitoring/ educational) ( Table 1).

Data synthesis and presentation
Included reports were highly heterogeneous in respect to population, intervention and setting and therefore a narrative summary was performed to synthesise findings. Summary data are presented with raw numbers and associated percentages where appropriate.

Results
The search of the NIHR Journals (HTA, PGfAR and PHR) for the period between 1st January 2018 and 30th June 2020, yielded a total of 237 reports that were potentially eligible for inclusion (HTA n = 174; PGfAR n = 22; PHR n = 41). The third review author was referred to 34 discrepancies out of 320 (10.6%). Seventy-six (32.1%) reports met all the eligibility criteria and were included in this review. In all, 161 studies were excluded: 140 based on title information, 19 on abstract information and two following full text review. The PRISMA flow chart depicting through the review is shown in Fig. 1.

Adherence reporting
Among the 76 studies in the review, eight (10.5%) did not report measurements of adherence. Of those eight, five [18][19][20][21][22] included an evaluation of intervention fidelity which focused on how well the intervention was delivered rather than the extent to which it was received by the participants, two [23,24] were only related to a pharmacological element of the intervention (e.g. medication consumption) and to date one [25] had not reported findings. These studies were excluded from subsequent analysis.
Measures of adherence included ( 41.2%), whether participants received their allocated intervention/ treatment (n = 27, 41.2%), self-reported (diaries, questionnaires, SMS responders) (n = 8, 11.8%) and five (7.3%) used mixed methods e.g. two or more of the methods mentioned. One study did not provide a description of how adherence was measured. Behavioural change (n = 10, 43.5%), psychological therapy (n = 5, 83.3%) and physiotherapy/ rehabilitation (n = 8, 66.7%) interventions predominantly based their measurements on recording the number of sessions attended. Whereas, medical device (n = 4, 66.7%) and surgical (n = 13, 100%) trials primarily recorded the number of participants receiving the allocated intervention. A rationale for applying an intervention threshold was only provided in two (4.1%) of the 49 reports. The reasoning was based on (1) previous findings by the authors who stated 'people would find benefit from attending at least four sessions' [56] and (2) that 'if individuals in the RAFT programme intervention arm were considered participants only if they attended the first cognitive-behavioural session (which might then have an impact on their outcome), then the offer of the intervention does affect the outcome' [38].

Incorporation of adherence in trial findings
Seventy three (96.1%) studies analysed their primary outcome on an intention to treat basis (ITT), one (1.3%) on a complete-case analysis and two (2.6%) were not clearly stated. Forty-nine (72.1%) of the 68 reports that measured adherence performed an additional statistical analysis to attempt to quantify the possible impact of adherence on the primary outcome (Supplementary Table 2 Table 2). Higher rates were seen in those that measured adherence by intervention allocation such as surgical and medical device trials, whilst lower rates were identified in session attendance in trials of behavioural change, and psychological therapy interventions. Although the majority of trials (n = 33, 67.3%) did not show a difference in findings after conducting a sensitivity analysis for adherence rates, a third of trials did The majority of authors considered this a dose-response effect as these were mostly measured based on the number of sessions attended. For eleven trials (22.4%) the interpretation of results differed when including adherence data in their statistical models for primary or secondary outcome data.

Discussion
Adherence is a key determinant of health outcomes and an important concept in the analysis and interpretation of randomised trials: appropriate consideration of adherence is now widely recommended [7,8,91]. In order to understand results from trials and inform clinical practice, it is fundamental that treatment adherence is consistently defined, considered in sensitivity analyses, and reported, but our results suggest that current practice is inconsistent. We found that although the vast majority of studies reported adherence, terminology around adherence and compliance varied widely, and over a quarter of studies did not report any analysis of adherence data. Measurement of adherence also varied greatly between clinical areas with surgical and medical device trials largely concerned with whether participants received the allocated intervention, whereas trials of rehabilitation, behavioural change, and psychological interventions were primarily concerned with session attendance. How adherence is defined, measured, and incorporated clearly matters because over a third of studies report a difference in findings between the primary and sensitivity analyses. This will have important implications for clinical practice.
The challenges in measuring and incorporating adherence within analyses are increasingly recognised [10,92] but an essential starting point for greater consistency would be the terminology used. We found variation between and even within studies, particularly in the conflation of adherence, compliance, and fidelity. Reducing this variation is an important first step and requires a common understanding of the terms between areas of clinical practice and members of the clinical research team. In a biopsychosocial model of care, the term adherence is more appropriate as it respects the patient's autonomy, yet the most widely used statistical technique to account for adherence is CACE, therefore is likely to increase the use of the less preferred term compliance [6,12].
Central to the WHO definition of adherence is that it is a behaviour from the patient rather than clinician [6]. Conceptually, this is perhaps clearest in rehabilitation interventions where the patient will determine whether they follow a prescribed exercise programme. Fidelity, on the other hand focusses on how well the trial follows the randomisation and protocol, for example whether the clinician prescribes/delivers the correct intervention [93,94]. This dichotomy is perhaps oversimplistic as it does not fully capture the complexity of these behaviours and increasingly shared nature of decision making in healthcare. The balance of these two elements will also vary in different clinical situations (e.g. patients may have little say in the exact model of clinical implant used in an operation, but they will have a far greater say in whether they perform a home exercise programme).
Complex interventions and packages of care perhaps present particular challenges and raise many questions [10,95]. For example, when a relatively simple intervention is considered, such as a medication, measurement of adherence should focus on whether the participant took the medication. In contrast a programme of exercises may contain a series of weekly clinic visits, and multiple exercises performed at home each day. What should be measured in this instance? Performance of each exercise, whether a session was conducted each day, whether the patient attended the clinical session, or a combination of these elements? The situation becomes even more complex if the intervention consists of a surgical procedure and rehabilitation programme. The vast majority of rehabilitation studies in our review defined adherence as attendance at the clinic sessions but it is unclear whether this adequately reflects the total volume of exercise performed and physiological effects. Clearly any measurement must be feasible and be cognisant of the burden on participants and clinicians, but our review raises questions about whether greater emphasis should be placed on measuring what matters, rather than what is easy to measure. Equally, more attention could be paid to the relationship between how adherence is defined in a particular study and the logic model pertaining to the potential mechanism of action. Researchers could make more attempts to justify how 'adherence' was defined and consider sensitivity analyses around such definitions.
Advances in technology, such as temperature sensors and wearable activity trackers, are starting to provide potentially more objective measures for some clinical scenarios such as orthoses [96] and exercises [97] but this is unlikely to be an option in all areas of clinical practice. Elsewhere, study-specific subjective measures, such as questionnaires and diaries, have been developed but few of these are validated or used more than once which limits the comparability of study results and prohibits pooling of data in meta-analyses [10,92]. Attempts have previously been made to improve the reporting of adherence in particular clinical disciplines such as substance abuse and health behaviour change and non-pharmacological disciplines such as exercise and physical activity interventions, but the situation remains unsatisfactory [10,94,[98][99][100][101]. There is still a lack of uniform/transferable guidance, or framework for ensuring comprehensive measurement, analysis, and reporting of adherence. It also remains unclear how thresholds for adherence are determined within research into non-pharmacological therapies. For example, we often see acceptable adherence defined as a specific figure such as 80%, but how often is this based on a sound theoretical framework? In our review, only two reports provided a rationale for the adherence threshold they used. Further methodological work and guidance is urgently needed in this area. Funding bodies must work with researchers to develop evidence-based adherence thresholds which can then be applied in future clinical trials.
Our review is novel in that, to our knowledge, it is the first systematic review to consider recent practice across the spectrum of clinical research within a country. Instead of reviewing published papers, we searched for final reports from the NIHR library. This confers two principle benefits: the NIHR is the largest funder of clinical research within the UK, funding and project delivery is subject to extensive peerreview and expert scrutiny, so is likely to reflect contemporary highquality research practice; and these reports are typically around 200 pages in length so contain much greater methodological detail than can be reported in journal publications. The primary limitation is that this work is that the research does not reflect international research practice and, therefore, we must be cautious not to generalise these findings beyond the UK. Secondly, although the length of the reports allows additional detail to be reported it is also conceivable that we missed some information. We attempted to mitigate against this through use of a second reviewer to check for accuracy and completeness and using a third reviewer to resolve any disagreement.
We chose to include phase III clinical studies across a range of clinical areas to capture current practice in a range of clinical specialities and disciplines. Given the additional complexity of phase IV trials and the predominance of pharmacological interventions and long term implementation studies, we chose not to include them in our review as this would further increase the heterogeneity of our sample. Inclusion of a wider range of study designs should be considered for future research.
Lastly, given the absence of a universal classification system that we could apply to the range of interventions being tested in the RCTs we identified, we developed our own subjective classifications e.g. behaviour change, psychological etc. Although attempts were made to capture the main emphasis of the intervention in our classification process, this method is likely to have resulted in some misclassification as some interventions would include elements from more than one category. If future research is to continue to compare research practice, more thought should be given to a repeatable classification system.

Conclusion
Our findings indicate that although the majority of clinical studies report elements of adherence there is a lack of consistency in use of key terminology, and no systematic approach to its measurement, analyses, interpretation, or reporting. Given the importance of adherence within clinical trials, we consider that further methodological research, and a framework or guidance, should be developed as a matter of urgency.

Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author contributions
MB, JA, and CF conceived the study. AG, KJ, and RMC conducted the literature searches, screened eligible manuscripts and conducted the data extraction. MB, JA and CF acted as a third reviewer to provide consensus where required. MB, JA, KJ, and AG drafted the manuscript which all authors reviewed. All authors read and approved the final manuscript.

Availability of data and materials
All data generated or analysed during this study are included in this published article and its supplementary material.