Rigorous and rapid evidence assessment in digital health with the evidence DEFINED framework

Dozens of frameworks have been proposed to assess evidence for digital health interventions (DHIs), but existing frameworks may not facilitate DHI evidence reviews that meet the needs of stakeholder organizations including payers, health systems, trade organizations, and others. These organizations may benefit from a DHI assessment framework that is both rigorous and rapid. Here we propose a framework to assess Evidence in Digital health for EFfectiveness of INterventions with Evaluative Depth (Evidence DEFINED). Designed for real-world use, the Evidence DEFINED Quick Start Guide may help streamline DHI assessment. A checklist is provided summarizing high-priority evidence considerations in digital health. Evidence-to-recommendation guidelines are proposed, specifying degrees of adoption that may be appropriate for a range of evidence quality levels. Evidence DEFINED differs from prior frameworks in its inclusion of unique elements designed for rigor and speed. Rigor is increased by addressing three gaps in prior frameworks. First, prior frameworks are not adapted adequately to address evidence considerations that are unique to digital health. Second, prior frameworks do not specify evidence quality criteria requiring increased vigilance for DHIs in the current regulatory context. Third, extant frameworks rarely leverage established, robust methodologies that were developed for non-digital interventions. Speed is achieved in the Evidence DEFINED Framework through screening optimization and deprioritization of steps that may have limited value. The primary goals of Evidence DEFINED are to a) facilitate standardized, rapid, rigorous DHI evidence assessment in organizations and b) guide digital health solutions providers who wish to generate evidence that drives DHI adoption.

Sham controls are designed to blind participants to trial arm assignment and equalize engagement across arms. This approach may allow unconfounded attribution of benefit to a DHI. However, "sham apps" may mask non-specific risks associated with increased smartphone exposure, 84 because smartphone use is equal across arms in trials employing sham apps. Growing evidence [85][86][87][88][89][90] suggests that increased smartphone exposure may harm mental health.
Usual care (UC) controls (defined elsewhere 91 ) receive no treatment from the study. UC controlled trials cannot distinguish specific effects (eg, impact of app-delivered health education) from non-specific effects (eg, impact of taking time to use an app, which may reduce time exposed to stressors). However, UC control conditions should not mask the aforementioned non-specific harms, where they exist.
Advantages and disadvantages of other control condition types, including standard of care controls, are reviewed elsewhere. 91 Sham controls may be appropriate for explanatory trials, where DHI safety has been established. UC controls may be appropriate for pragmatic trials, where the goal is to generate evidence that guides real-world decisions, and where some non-specific mechanism of benefit is acceptable.

Examples meeting criterion
A high-quality trial with UC controls showed clinically and statistically significant benefit, and evaluators are comfortable with the possibility of non-specific mechanisms of benefit.

Example not meeting criterion
A high-quality trial with UC controls showed clinically and statistically significant benefit. Evaluators have stringent standards and want to know that benefits are mediated through specific mechanisms. Trials should be registered prior to start of enrollment. 103 Results should be shared within 12 months of trial completion 104 wherever feasible, and should be published in peer-reviewed journals.
Registration is not required for some DHI commercialization paths. This can increase publication and reporting bias, 105 reducing replicability of findings. We therefore cannot predict that future DHI deployments will be as effective as reported in unregistered trials.
Note that all interventions show distributions of effect sizes across samples. Due to selective reporting and the inconsistency of trial registration in DH, many published DHI effect sizes may represent only the most favorable sliver of the relevant effect size distributions. Some digital health solutions providers (DHSPs) formally self-attest to following best practices, often in collaboration with a trade organization. This may be helpful, but self-attestation is not a substitute for evidence.

Example meeting criterion
High-quality, peer-reviewed evidence shows a mean reduction in hemoglobin A1c of 0.7, relative to no change for controls.
Example not meeting criterion A DHSP signed a selfattestation stating that they follow best practices. Though peer review is often expected, some DHSPs rely on "white papers." These marketing documents may show levels of rigor and transparency that are inadequate for appropriate evidence assessment.
Evidence published in predatory journals (defined elsewhere 107 ) is also inadequate.

Example meeting criterion
High-quality, peer-reviewed evidence shows a mean reduction in hemoglobin A1c of 0.7, relative to no change for controls.
Example not meeting criterion An uncontrolled, retrospective analysis for an unreported number of patients shows robust A1c reductions. Evidence is not peerreviewed, but rather is reported in a white paper. Patients who enroll in health management programs often differ meaningfully from those who decline to participate. 108 For example, enrollees may have stronger motivation to self-manage chronic conditions. Matching on demographics does not resolve this.

Example meeting criterion
The rate of acute clinical events for DHI users is 15% lower than that of randomly assigned, waitlisted controls.
Example not meeting criterion The rate of acute clinical events for DHI users is 15% lower than that of demographics-matched adults who declined to participate. As DHI deployment scales up, or as business models evolve, intervention components previously implemented by program staff may be automated. Reducing human interaction may reduce effectiveness in some cases. 110 Example meeting criterion High-quality evidence of efficacy was generated for an automated DHI product version.
Example not meeting criterion High-quality, peer-reviewed evidence was generated for a DHI version incorporating video chat with a clinical pharmacist. After a pivotal trial, this intervention component was automated.
No post-automation evidence is available. This criterion pertains only to DHIs for which evidence has been generated in both registered and unregistered trials.
Trial registration (eg, through clinicaltrials.gov) is not required for some commercialization paths. This may increase publication bias 105 and reduce the likelihood of replicating reported effect sizes.
We do not expect any two studies to show identical effect sizes. But if effect sizes for registered and unregistered trials differ to a clinically meaningful degree, this may raise concern for publication bias.
Consider investigating differences across studies that may explain any effect size inconsistencies. Such differences may relate to DHI versions, implementation protocols, sample characteristics, or sample sizes (small samples increase risk for outlying effect sizes).

Example meeting criterion
High-quality, peer-reviewed evidence shows mean reductions in hemoglobin A1c of 0.7 and 0.5, both relative to no change observed for controls, in registered and unregistered trials, respectively.
Example not meeting criterion High-quality, peer-reviewed evidence shows mean reductions in hemoglobin A1c of 0.7 and 0.1, both relative to no change observed for controls, in registered and unregistered trials, respectively.

Summary of literature reviewed and conclusions
The failure of any framework to meet all 4 criteria may underscore the need for a novel approach. Few frameworks explicitly recommend use of well-developed, extant methods (eg, GRADE 1 ). This suggests that there may be an opportunity to improve the quality of DHI assessment in the digital health community, through consistent use of mature best practices. Failure of most frameworks to address DH-specific evidence quality considerations, and failure of any framework to address areas requiring increased vigilance, suggest that the common issues listed in Supplementary Table 2 may not be addressed adequately in prior frameworks. Finally, the rarity of evidenceto-recommendation guidelines suggests that prior frameworks may leave the evaluator without clear guidance regarding practical implications of assessment findings. For example, if a prior framework is applied and yields "moderate evidence" supporting safety and effectiveness, it may be unclear what degree of adoption is appropriate.
Again, this underscores the need for novel approaches to evidence assessment in digital health.

Updating Process for the Evidence DEFINED Framework
Given the rapid evolution occurring in digital health, Evidence DEFINED will be updated every 6-12 months. Feasibility is a priority in the design of this process. A public suggestion form will be posted through the website of the Digital Medicine Society (DiMe) no more than 3 months after publication of the Framework. This will mimic a simple form 113 posted previously on a Digital Medicine Society website 114 to gather public feedback for a prior, similar initiative. 115 Availability of the form will be announced to DiMe members and through public channels.
For each update cycle, the text of form submissions will be deidentified and compiled in a single document. Leadership from DiMe and the Evidence DEFINED Workgroup will summarize each suggestion proposed, and share compiled information with the Workgroup for feedback. Live Workgroup meetings may be convened, as needed, to address any complex suggestions. Subsequently, the following materials will be revised, as appropriate, to reflect approved updates: Evidence DEFINED Overview and Quick Start Guide (Figure 1), Supplementary Checklist of Evidence Quality Criteria for Digital Health Interventions (Supplementary Table 2), and Evidence-to-Recommendation Guidelines (Table 2). Updated Evidence DEFINED materials will be posted on DiMe's website.
Updating cycles will continue for two years. DiMe and Workgroup leadership will then decide whether or not to continue updating the Framework. This will depend on the frequency of public suggestions and resource availability.