Keywords
Bayes Factors, digital interventions, alcohol reduction, smartphone apps
This article is included in the University College London collection.
Bayes Factors, digital interventions, alcohol reduction, smartphone apps
A factorial experiment evaluating the effect of ‘enhanced’ versus ‘minimal’ versions of five components of the alcohol reduction app, Drink Less, found no clear evidence for simple effects but did find evidence that two-way combinations of certain ‘enhanced’ components together resulted in greater reductions than ‘minimal’ versions1. This was a planned analysis but should be interpreted with caution as the two-way interactive effects were not specifically hypothesised a priori and were part of multiple interactions tested. Findings of this sort are not uncommon in experimental studies. One approach is to start another randomised trial specifically to test this hypothesis. A potentially more efficient alternative is to extend the trial with further recruitment and test this and other hypotheses using Bayes factors2,3. We used this approach with the Drink Less app.
Bayes factors are a measure of strength of evidence and allow researchers to ‘top-up’ their results from one trial with additional data collected, regardless of the stopping rule, unlike frequentist statistics2. The use of Bayes factors supports efficient, incremental model building3, as evidence can be continuously accumulated until it is clear whether there is an association or not2,4. The rapid accumulation of large amounts of data about digital behaviour change interventions (DBCIs) offers the opportunity to apply emerging methods to their evaluation. DBCIs often have the capacity to continue automatic data collection beyond the end of a trial with little or no additional resources. This paper will illustrate how Bayes Factors can be used to optimise a DBCI by updating evidence from an effectiveness trial using the example of Drink Less—an alcohol reduction app.
Bayes factors are the ratio of the average likelihood of two competing hypotheses being correct given a set of data and can overcome some of the issues associated with traditional frequentist statistics5. They indicate the relevant strength of evidence for two hypotheses; when evaluating interventions, the two hypotheses are typically the alternative hypothesis (the intervention had the desired effect) and the null hypothesis (the intervention had no effect). Bayes factors, unlike frequentist statistics, can distinguish between two interpretations of a non-significant result: i) support for the null hypothesis of ‘no effect’ and ii) data are insensitive to detect an effect i.e. ‘unsure about the presence of an effect’5,6. Calculating Bayes factors to supplement frequentist statistics is a quick and simple procedure with several software packages freely available (e.g. an online calculator developed by Zoltan Dienes7). Researchers are actively encouraged to supplement, or even replace, classical frequentist hypothesis testing with a Bayesian approach to provide greater interpretative value to any non-significant results8. This is important as often non-significant results are misinterpreted as evidence for no effect; a review of trials conducted in addictions research found that the reporting of ‘no difference’ was only appropriate in a small number of papers reporting this 9.
The use of Bayes Factors also has another major advantage over the traditional frequentist approach that relates to the stopping rule. The traditional frequentist approach necessitates a strict stopping rule and a single analysis of data. Typically, this involves an a priori power calculation to specify the required sample size for data collection and the trial to end at that point. Subsequent ‘topping-up’ of existing data and re-analysing the new larger data set is ‘prohibited’10. This is because any p-value between 0 and 1 is equally likely if the null hypothesis is true, regardless of how much data are collected11. Therefore, given enough time and data collection, a significant p-value will always be obtained even if the null hypothesis is true10. So if researchers find a non-significant result—which cannot distinguish between support for the null hypothesis and being insensitive to detect an effect—then a new study would be required to build on these findings. Restarting the process is a waste of research resources but necessary in the context of using a frequentist approach for analysis because additional data collected cannot be analysed. However, this is not the case when using Bayes factors, as they are driven towards zero when the null hypothesis is true and additional data are collected10. Therefore, researchers may use Bayes factors to analyse additional data to complement an employed stopping rule2.
In the evaluation of DBCIs, using Bayes factors is beginning to complement traditional frequentist statistics4,12, and analysing additional data would be of particular benefit. Data collection for a DBCI effectiveness trial is typically automated and therefore does not require additional resources to continue after a pre-specified sample size is reached. Rapid evaluations of DBCIs and efficient accumulation of evidence can be used to inform future versions, keeping pace with advances in technology. Using Bayes factors to update findings about the relative plausibility of the two hypotheses allows researchers to assess the DBCI’s effectiveness in an ongoing manner4. This remains useful when deciding about whether there is sufficient evidence to demonstrate effectiveness and, therefore, continued development13. To the authors’ knowledge, no DBCIs have used additional data collected to supplement original effectiveness trial findings.
DBCIs require novel methods of evaluation that are quick and timely to inform the optimisation of the intervention14. The multiphase optimisation strategy (MOST) is a method for building, optimising and evaluating multicomponent behavioural interventions. It involves a series of steps identifying the set of intervention components to be examined and evaluating the effects of these components13,15. Factorial trial designs allow the simultaneous evaluation of the intervention components, which enables both the independent and interactive effects to be estimated13. Using a factorial trial to evaluate a DBCI can overcome some of the challenges associated with using the traditional randomised controlled trial, such as prolonged duration from recruitment to publication and a high-cost trial implementation16,17. The results from a factorial trial can be used to make decisions about which components to retain when optimising the intervention15.
The Drink Less smartphone app is a DBCI aimed at supporting people who drink excessively to reduce their alcohol consumption. It was developed using evidence and theory, following MOST. The app was analysed in a full factorial trial to assess the effectiveness of its five intervention modules and their effects on app usage and subsequent usability ratings18. The stopping rule for data collection, in line with the frequentist approach to analysis, was pre-specified, although data collection continued under the same conditions as the original factorial trial. Analysis of the original trial data using Bayes factors indicated that the data were insensitive to detect main effects but that combinations of the modules appeared effective1.
The aims of this study are substantive and methodological:
1. To update the evidence on effectiveness of Drink Less app components singly and in combination. Specifically, what are the main and two-way interactive effects of the intervention modules on:
2 To demonstrate how Bayes Factors can be used to analyse additional outcome data collected in effectiveness trials and update beliefs about hypotheses.
A between-subject full factorial (25) trial to evaluate the effectiveness of five intervention modules in the Drink Less app. The research questions were specified prior to the trial commencing and pre-registered on ISRCTN (registration number: ISRCTN40104069) and published in an open-access protocol paper18.
Participants were included in the study if they: were aged 18 or over; lived in the UK; had an AUDIT score of 8 or above (indicative of excessive drinking19); were interested in reducing their drinking; provided an email address and had downloaded a ‘trial version’ of the app (described below).
The sample size for the original factorial trial was 672 providing 80% power (with alpha at 5%, 1:1 allocation and a two-tailed test) to detect a mean change in alcohol consumption of 5 units between the ‘enhanced’ and ‘minimal’ versions for each intervention module20, comparable with a face-to-face brief intervention21. This assumed a mean of 27 weekly units at follow-up in the control group, a mean of 22 units in the intervention group and a SD of 23 units for both (d=0.22).
Recruitment was undertaken via promotion from organisations, such as Public Health England, Cancer Research UK, and listing the app in the iTunes Store according to best practices for app store optimisation.
Baseline measures included the AUDIT questionnaire and a socio-demographic assessment (age, gender, ethnic group, level of education, employment status and current smoking status). The primary outcome measure was self-reported change in past week alcohol consumption (the difference between one-month follow-up and baseline). The secondary outcome measure was self-reported change in full AUDIT score.
The Drink Less app is a DBCI for people who drink excessively to help them reduce their alcohol consumption. It is freely available on the UK version of the Apple App Store for all smartphones and tablets running iOS8 or above. The content of the app did not change during the trial except for minor bug fixes.
The app is structured around goal setting: users can set their own goals based on units, cost, alcohol free days or calories with information on the UK drinking guidelines, units and alcohol-related harms. There are five intervention modules that aim to help them achieve their goal: Normative Feedback (providing normative feedback on the user’s level of drinking relative to others); Cognitive Bias Re-training (a game to retrain approach-avoidance bias for alcoholic drinks); Self-monitoring and Feedback (providing a facility for self-monitoring of drinking and receipt of feedback); Action Planning (helping users to undertake action planning to avoid drinking), and Identity Change (promoting a change in identity in relation to alcohol). In the trial version of the app, the five intervention modules existed in two versions: i) an ‘enhanced’ version containing the predicted active ingredients and ii) a ‘minimal’ version that acted as a control.
A detailed description of the content, development and factorial trial evaluation of the app is reported in two separate papers1,22.
Data collection for the factorial trial began on 18th May 2016 and the required sample of eligible users was reached on 10th July 2016; follow-up data were collected until 28th August 2016. Trial data was collected continuously for a further four months until 19th December 2016 under the same conditions as the original factorial trial (i.e. a ‘trial version’).
Informed consent to participate in the trial was obtained from all participants on first opening the app. Users who consented to participate completed the AUDIT and a socio-demographic questionnaire, indicated their reason for using the app and provided their email address for follow-up (a prize of £100 was offered in an attempt to decrease the proportion of users leaving this field blank). Users were then provided with their AUDIT score and, those who met the inclusion criteria, were randomised to one of 32 experimental conditions using an automated algorithm within the app for block randomisation.
Follow-up was conducted 28 days after participants downloaded the app and the questionnaire consisted of the full AUDIT and usability measures. Follow-up was conducted in two ways: i) via email with a link to the questionnaire in an online survey tool (Qualtrics), which also sent up to four reminders, and ii) within the app. Participants included according to the original trial and stopping rule were due to complete the follow-up questionnaire up until 29th August 2016 and were contacted via email (through Qualtrics) and the app. Participants due to complete the follow-up questionnaire from 30th August onwards, were only contacted via the app.
Ethical approval for Drink Less from the UCL Ethics Committee under the ‘optimisation and implementation of interventions to change health-related behaviours’ project (CEHP/2013/508).
All analyses were conducted using R version 3.4.0. The analysis plan for this paper followed a similar analysis plan as for the original factorial trial (which was pre-registered on 13th February 2016; ISRCTN4010406918).
Participant characteristics were reported descriptively by intervention module. A factorial between-subjects design was used to assess the main and two-way interactive effects of the five intervention modules on the primary and secondary outcome measures. Analyses were conducted amongst responders only, those who completed the follow-up questionnaire. Bayes Factors were calculated for each analysis assessing the main and the two-way interaction effects of the five intervention modules on the outcome measures. The two-way interactions were defined as enhanced/enhanced versus minimal/minimal for each pair of intervention modules. The mean difference and standard error of the mean difference for each main and two-way interactive effect was calculated. A half normal distribution was used to specify the predicted effect. Peak at 0 (no effect) with a SD equal to the expected effect size. This is a conservative approach and represents a hypothesis that the intervention had a least some positive effect, with the effect being more likely to be smaller than larger. Bayes factors were calculated using an online calculator7.
The expected effect size for the primary calculation of Bayes factors was a reduction of 5 units per week (d=0.22), reflecting a large effect and that of the power calculation for the original factorial trial. Bayes Factors were also calculated for a medium effect (reduction of 3 units per week), and a small effect (reduction of 0.5 units per week) to permit a relative judgment for screening purposes. The expected effect size for the secondary outcome measure was calculated by translating the estimated effect size for the primary outcome measure (d=0.22) into the equivalent mean difference score of 1.45 (mean=19.1, SD=6.56 [based on original trial users, n=672]). Bayes factors will be interpreted in terms of categories of evidential strength (see Table 1)5,23.
The total sample size was 2586, of these 1914 (74.0%) were additional users to the original factorial trial (672, 26.0%). In total, 342 users (13.2%) completed the primary outcome measure in the follow-up questionnaire—the original users’ response rate was 26.6% and the additional users’ response rate was 8.5%. Figure 1 shows a flow chart of users throughout the study.
Socio-demographic and drinking characteristics of participants are reported in Table 2. Participants’ mean age was 37.2 years, 53.4% were women, 95.8% were white, 74.3% had post-16 qualifications, 87.0% were employed, and 30.0% were current smokers. Mean weekly alcohol consumption was 39.0 units, mean AUDIT-C score was 9.3, and mean AUDIT score was 19.1, indicating harmful drinking. Participants’ characteristics by intervention module are reported in Table 2. Generally, characteristics were similar for the enhanced and minimal version of each intervention module.
The main effects of the intervention modules are reported in Table 3 for the change in past week’s alcohol consumption. Bayes factors showed that the data were insensitive to detect an effect for Normative Feedback for effect sizes of 5-, 3- and 0.5-unit reductions (0.47<BF<0.97). Data were insensitive to detect an effect for Cognitive Bias Re-training for effect sizes of 5-, 3- and 0.5-unit reductions (0.74<BF<1.06). Bayes factors showed that the data were insensitive to detect an effect for Self-monitoring and Feedback for effect sizes of 5-, 3- and 0.5-unit reductions (0.43<BF<0.95). Bayes factors showed that the data were insensitive to detect an effect for Action Planning for effect sizes of 5-, 3- and 0.5-unit reductions (0.83<BF<1.08). Bayes Factors for Identity Change showed support for the null hypothesis of no difference between the enhanced and minimal version of the module for a 5-unit reduction (BF=0.22), though data were insensitive to detect an effect for 3- and 0.5-unit reductions (0.34<BF<0.81). The data were insensitive to detect a two-way interactive effect between any pair of intervention modules for effect sizes of 5-, 3- or 0.5-unit reductions (0.35<BF<1.22), except for between Self-monitoring and Feedback and Identity Change for a 5-unit reduction which supported the null hypothesis (BF=0.31) (see Extended data, Supplementary Table 124).
The main effects of the intervention modules are reported in Table 4 for the change in AUDIT score. The data were insensitive to detect an effect on change in AUDIT score for: Normative Feedback (BF=0.60); Cognitive Bias Re-training (BF=0.98); and Action Planning (BF=0.95). The data supported evidence for the null hypothesis of no difference in AUDIT score between enhanced and minimal versions of Self-monitoring and Feedback (BF=0.15) and Identity Change (BF=0.14). The two-way interactive effects of intervention modules on change in AUDIT score (see Extended data, Supplementary Table 224) showed that the majority of data were insensitive to detect any two-way interactive effects (0.33<BF<1.99). Data supported the null hypothesis for no difference between enhanced and minimal versions between Normative Feedback and Identity Change (BF=0.29) and Self-Monitoring and Feedback and Identity Change (BF=0.18).
Four intervention modules (Normative Feedback, Cognitive Bias Re-Training, Self-Monitoring and Feedback, and Action Planning) have some evidence in support of their role of reducing alcohol consumption. Therefore, an unplanned analysis was conducted to assess whether there is a larger cumulative effect of the combination of all four modules in the enhanced version compared with the minimal version. This was done for responders only (n=39; 12 “off” vs 27 “on”) and for last observation carried forward (n=324; 164 “off” vs 160 “on”) to provide potential evidence for what effect size we can expect when planning the trial. Last observation carried forward means that participants’ past week alcohol consumption at follow-up was used for all of those who responded to follow-up and the baseline measure for past week alcohol consumption was used for those who did not respond to follow-up. Whilst last observation carried forward has its limitations, it maintains the variability within the data. Table 5 reports the Bayes factors for these analyses. There was a large numerical difference between all enhanced and all minimal for the four modules amongst responders only, although the Bayes factors found that the data were insensitive to detect an effect, which may be due in part to the small sample size.
The calculation of Bayes factors for additional data collected beyond the original factorial trial of Drink Less has allowed us to accumulate and update existing evidence on the effectiveness of its intervention components in reducing alcohol consumption. The supplemented data remained insensitive to detect whether the Drink Less app components have large (5-unit) individual or two-way interactive effects on reducing alcohol consumption though tended towards anecdotal evidence for the null hypothesis of no effect. There was evidence of two-way interactive effects in the original factorial trial that is no longer supported by the supplemented data.
The current data also remained insensitive to detect whether the four most promising components (Normative Feedback, Cognitive Bias Re-Training, Self-Monitoring and Feedback and Action Planning) may each have effects smaller than 5 units. An unplanned analysis provided weak anecdotal evidence of a synergistic effect of the ‘enhanced’ versions of these four intervention modules together. On both past week alcohol consumption and AUDIT score, and across several alternative effect sizes, there was support for no effect of the fifth intervention module, Identity Change. These findings, alongside results from analysing user feedback and usage data on the most frequently visited screens, guided the decision to remove the Identity Change module from the next major app update whilst retaining Normative Feedback and Cognitive Bias Re-Training, and Self-Monitoring and Feedback and Action Planning.
A major strength of this study is its illustration of how it is possible to evaluate data from trials of DBCIs in an on-going manner. No additional resources were required to continue data collection within the original trial of Drink Less. Analysing the supplemented dataset has allowed us to update our findings and provided more confidence in our original decisions on which components to retain or remove. The stopping rule in frequentist statistics means that additional trial data collected as part of an effectiveness trial for a DBCI would go to waste. The use of Bayes factors in this situation prevents unnecessary waste of resources and enables researchers to continually update their evidence on a DBCI rather than collect and analyse individual data sets as part of separate trials.
A limitation of this study and the use of Bayes factors was that we were not able to use the intention-to-treat (ITT) approach in the analysis (as was done for the original trial), whereby those lost to follow-up (non-responders) were assumed to be drinking at baseline levels. Whilst Bayes factors can overcome a lot of the issues with the frequentist approach, they are not meaningful when assumptions are made that limit the variability in the data. Due to low overall follow-up rates (13.2%) in this larger sample, the ITT assumption that there was no change in the large majority of the sample drives the variability down, which in turn drives support for the null hypothesis. This highlights that Bayes factors were not useful in this study when using the ITT assumption, which limits the variability in the data.
The intervention modules of the Drink Less app do not have a large individual effect on reducing alcohol-related outcomes, though they may have a small effect that the current data were unable to detect. There is weak evidence for a synergistic effect of the ‘enhanced’ versions of four intervention modules together: Normative Feedback and Cognitive Bias Re-Training, and Self-Monitoring and Feedback and Action Planning. This study has updated the existing evidence on the effectiveness of intervention modules in the Drink Less app. In the event of uncertain results following a primary analysis, Bayes factors can be used to ‘top-up’ results from DBCI trials with any additional data collected, therefore supporting efficient, incremental model building to inform decision-making.
A dataset containing the extended trial outcomes is available on OSF. DOI: https://doi.org/10.17605/OSF.IO/KQM8B24.
Extended data are available on OSF. DOI: https://doi.org/10.17605/OSF.IO/KQM8B24.
Supplementary Table 1. Two-way interactive effects of intervention modules on change in past week’s alcohol consumption.
Supplementary Table 2. Two-way interactive effects of intervention modules on change in AUDIT score.
Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).
JB and RW are funded by Cancer Research UK (CRUK; C1417/A22962). CG and SM are funded by CRUK and the National Institute for Health Research (NIHR)’s School for Public Health Research (SPHR). Drink Less was funded by NIHR SPHR, the UK Centre for Tobacco and Alcohol Studies (UKCTAS), the Society for the Study of Addiction (SSA), and CRUK. The views expressed are those of the author(s) and not necessarily those of the NHS, the National Institute for Health Research or the Department of Health.
The research team is part of the UKCTAS, a UKCRC Public Health Research Centre of Excellence. Funding from the Medical Research Council, British Heart Foundation, Cancer Research UK, Economic and Social Research Council and the National Institute for Health Research under the auspices of the UK Clinical Research Collaboration, is gratefully acknowledged.
The funders played no role in the design, conduct or analysis of the study, nor in the interpretation or reporting of study findings.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the rationale for developing the new method (or application) clearly explained?
Yes
Is the description of the method technically sound?
Yes
Are sufficient details provided to allow replication of the method development and its use by others?
Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?
Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: I have long experience in the area of research on brief interventions for hazardous and harmful alcohol consumption. However, although I have authored a publication using the Bayesian approach to hypothesis testing, I am by no means an expert on the use of Bayesian statistics.
Is the rationale for developing the new method (or application) clearly explained?
Partly
Is the description of the method technically sound?
Yes
Are sufficient details provided to allow replication of the method development and its use by others?
Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?
Partly
Competing Interests: Researcher on the InDEx app project - an app designed to help armed forces personnel monitor their alcohol consumption
Reviewer Expertise: Mobile health with a focus on alcohol misuse
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 2 (revision) 15 Jul 19 |
read | read |
Version 1 29 Jan 19 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)