A Library of Analytic Indicators to Evaluate Effective Engagement with Consumer mHealth Apps for Chronic Conditions: Scoping Review

Background There is mixed evidence to support current ambitions for mobile health (mHealth) apps to improve chronic health and well-being. One proposed explanation for this variable effect is that users do not engage with apps as intended. The application of analytics, defined as the use of data to generate new insights, is an emerging approach to study and interpret engagement with mHealth interventions. Objective This study aimed to consolidate how analytic indicators of engagement have previously been applied across clinical and technological contexts, to inform how they might be optimally applied in future evaluations. Methods We conducted a scoping review to catalog the range of analytic indicators being used in evaluations of consumer mHealth apps for chronic conditions. We categorized studies according to app structure and application of engagement data and calculated descriptive data for each category. Chi-square and Fisher exact tests of independence were applied to calculate differences between coded variables. Results A total of 41 studies met our inclusion criteria. The average mHealth evaluation included for review was a two-group pretest-posttest randomized controlled trial of a hybrid-structured app for mental health self-management, had 103 participants, lasted 5 months, did not provide access to health care provider services, measured 3 analytic indicators of engagement, segmented users based on engagement data, applied engagement data for descriptive analyses, and did not report on attrition. Across the reviewed studies, engagement was measured using the following 7 analytic indicators: the number of measures recorded (76%, 31/41), the frequency of interactions logged (73%, 30/41), the number of features accessed (49%, 20/41), the number of log-ins or sessions logged (46%, 19/41), the number of modules or lessons started or completed (29%, 12/41), time spent engaging with the app (27%, 11/41), and the number or content of pages accessed (17%, 7/41). Engagement with unstructured apps was mostly measured by the number of features accessed (8/10, P=.04), and engagement with hybrid apps was mostly measured by the number of measures recorded (21/24, P=.03). A total of 24 studies presented, described, or summarized the data generated from applying analytic indicators to measure engagement. The remaining 17 studies used or planned to use these data to infer a relationship between engagement patterns and intended outcomes. Conclusions Although researchers measured on average 3 indicators in a single study, the majority reported findings descriptively and did not further investigate how engagement with an app contributed to its impact on health and well-being. Researchers are gaining nuanced insights into engagement but are not yet characterizing effective engagement for improved outcomes. Raising the standard of mHealth app efficacy through measuring analytic indicators of engagement may enable greater confidence in the causal impact of apps on improved chronic health and well-being.


Background
There is mixed evidence to support current ambitions for mobile health (mHealth) apps to improve chronic health and well-being [1]. While some apps have demonstrated efficacy in definitive trials [2][3][4][5], others have performed poorly [6][7][8][9]. One proposed explanation for this variable effect is that users do not engage with apps as intended [10]. The construct of engagement has been quantitatively conceptualized as the amount, duration, breadth, and depth of intervention usage [11,12]. For many mHealth app evaluations, users can be segmented along a continuum of engagement; some will never use the app, some will use it but quickly abandon it, and some will use it in unexpected ways. Complex patterns of engagement with mHealth apps are emerging and challenge current conceptual paradigms for interpreting their impact on chronic health outcomes. These digitally mediated mechanisms of action require more granular evaluations capable of analyzing multilevel, temporally dense engagement data [13]. Evaluating engagement is therefore a priority and calls for the integration of nonintrusive measures of this construct in mHealth evaluation methodology [14].
Recently, scholars sought to further the conceptualization of engagement by proposing that it may be more valuable to identify the mechanisms that underlie effective engagement, defined as sufficient engagement with an intervention to achieve intended outcomes [14,15]. The construct of effective engagement differs conceptually from both engagement and adherence, which have historically been used interchangeably [16]. Sieverink et al reason that the following 3 elements are necessary to determine adherence to a digital health intervention: (1) the ability to measure usage behaviors, (2) an operationalization of intended use, and (3) an empirical, theoretical, or rational justification of intended use [17]. We propose that effective engagement is more intentional than engagement but less justified than adherence. It sits between both constructs and bridges the transition from identifying patterns of engagement toward evidencing their capacity to achieve intended outcomes.
There has been recognition that the definition of engagement has evolved to include offline interactions with the behavior change mediated by a digital health intervention. Yardley et al have been instrumental in furthering this conceptualization of engagement by suggesting that there are 2 levels of engagement: (1) the micro level of immediate engagement with the digital health intervention and (2) the macro level of engagement with the wider intervention-mediated behavior change [14]. They posit that engagement is a dynamic process marked by shifts in both micro and macroengagement, which will vary depending on the intervention, the user, and their context. Users may be macroengaging and experiencing positive behavior change, but this may not necessarily be reflected in their microengagement analytics data. In acknowledgment of this distinction between engagement with the technological and behavioral aspects of an intervention, Yardley et al critically posit that microengagement alone cannot be taken as a valid indicator of effective engagement. We do not dispute Yardley et al's arguments and recognize the limitations of relying solely on microengagement data to infer effective engagement. However, we posit that measuring and reporting on microengagement is fundamental to understanding how people actually use an app to improve their health and well-being. In turn, these analytic insights can be coupled with measures of macroengagement to identify the mediating mechanisms that motivate effective engagement.
The application of analytics, defined as the use of data to generate new insights [18], is an emerging approach to study and interpret engagement with mHealth interventions [19]. Van Gemert-Pijnen et al have advanced the application of log data analysis to inform how an intervention works in practice and which components should be improved to yield greater benefit [20][21][22]. Arden-Close et al have developed and implemented a novel R-based tool to visually explore patterns of engagement [23]. Heckler et al have called for the adoption of a continuous optimization model of evaluation that leverages simulated computational models to predict how users might engage with an intervention before data collection [24]. Scherer et al have demonstrated the value of joint models in the analysis of longitudinal engagement data. In fact, Scherer et al recently participated in a workshop sponsored by the National Institutes of Health on emerging technology and data analytics for behavioral health, and espoused the need for new analytic methods that can scale to thousands of individuals and billions of data points [19]. Short et al recently published a viewpoint on engagement measurement options that can be employed in electronic health (eHealth) and mHealth behavior change intervention evaluations [25]. They found that system engagement data are the most commonly collected and reported measures of engagement in eHealth and mHealth interventions. From this, they recommend having shared ways of conceptualizing these data as the field progresses to consolidate categorization.

Objectives
Motivated by the proven value of analytics to study engagement with mHealth apps, we sought to compile and catalog a library of analytic indicators of engagement with consumer mHealth apps for self-managing chronic conditions. We defined analytic indicators as proxy measures of engagement with an mHealth app based on objective usage that generates log data [14,22]. When positioned alongside other measures suitable for evaluating the subjective experience of mHealth app engagement, they may provide complementary data-driven insights into the objective extent of engagement. We propose that analytic indicators of engagement do exactly this: they indicate that users may be engaging effectively with a digital health intervention but do not definitively confirm a relationship between engagement and intended outcomes. Establishing this relationship requires adopting a mixed-methods multidimensional approach to measure effective engagement using multiple assessment strategies [14,25]. While many researchers have included analytic indicators as a study measure when evaluating apps, they are not consistent or systematic in their selection [26]. We propose that there is benefit to understanding how engagement with mHealth apps for chronic conditions has been defined, measured, and analyzed across evaluations. The aim of this scoping review was therefore to consolidate how analytic indicators of engagement have previously been applied across clinical and technological contexts to inform how they might be optimally applied in future evaluations.

Review Framework
This scoping review was guided by the methodological framework developed by Arksey and O'Malley [27] and advanced by Levac et al [28]. They endorse an iterative review process with 5 distinct steps: (1) identifying the research question, (2) searching for relevant studies, (3) selecting studies, (4) charting the data, and (5) collating, summarizing, and reporting results. This framework is particularly relevant to disciplines with emerging evidence, such as mHealth, in which the paucity of definitive research makes it difficult for researchers to undertake systematic reviews [28]. In this context, conducting a scoping review allowed us to incorporate a range of study designs beyond those accepted for inclusion in systematic reviews, to generate broad findings on how researchers are measuring engagement with consumer mHealth apps for chronic conditions. We made efforts to adhere to recommendations for each step, starting with the selection of a research question that was sufficiently broad to map the extent, range, and nature of mHealth engagement research activity. We conducted this review to explore the following research question: what analytic indicators of engagement are being used in evaluations of consumer mHealth apps for chronic conditions?

Search Strategy
A literature search was conducted in the MEDLINE, PsycINFO, CINAHL, and EMBASE databases. In addition, the Journal of Medical Internet Research and its sister journals were independently searched given their frequent and high-impact publication of mHealth research. A combination of different keywords for the constructs "engagement" and "mHealth" was used. No search terms for chronic conditions were defined a priori to broaden search results. We adopted the World Health Organization's definition of a chronic condition as a "non-communicable disease of long duration and slow progression [29]." Multimedia Appendix 1 presents our search strategy for MEDLINE on the Ovid platform.

Eligibility Criteria
Titles and abstracts retrieved from the search strategy were screened for inclusion against the following criteria: (1) the article described an evaluation or a protocol for an evaluation of a consumer mHealth app for self-managing a chronic condition; (2) the study included operationalization of an engagement-related construct-Multimedia Appendix 1 provides the full list of screened constructs; (3) the study included objective, quantifiable measurements using log data analytics; (4) the app was intended to be used more than once; (5) the article was published between November 1, 2015, and November 1, 2017; and (6) the article was published in English.
Studies were excluded if (1) the mHealth app was solely an appointment reminder service; (2) the primary app technology was short message service or interactive voice response; (3) the app was for an acute condition or preventive health purposes; (4) the app was a support tool for a patient's circle of care; (5) the app did not require user input through active or passive (sensor) data entry; (6) the app only delivered educational content; and (7) the article primarily described the design, development, or usability testing of the app.

Data Collection and Analysis
The first author conducted the electronic searches with support from a faculty-affiliated librarian and reviewed the reference lists of relevant articles. All identified titles and abstracts were downloaded and merged using Mendeley (Elsevier) [30] and duplicated records were removed. The first author independently screened all titles and abstracts against eligibility criteria. Any articles that caused the author uncertainty were retained until data extraction when more information was available to make an informed decision for inclusion in the review. Following title and abstract review, full papers of included abstracts were assessed for final selection by all study authors. Textbox 1. Codes extracted from included articles.
1. General information regarding the study title, authors, journal, year, and country.
2. App information, specifically the public name, chronic condition addressed, and accessibility of health care provider services.
3. Study information, specifically the purpose, duration, sample size, and design. 4. App structure (structured, hybrid, or unstructured): "Structured" apps contained locked, sequential components (eg, modules, lessons, and features) that users had to complete before moving forward. "Hybrid" apps contained both fixed core components and variable components for free use. "Unstructured" apps contained variable components that users could access and use at will.
5. Analytic indicators used to measure engagement, specifically the number of log-ins or sessions logged, the number of modules or lessons started or completed, the number of features accessed, the number of measures recorded, the number or content of pages accessed, the frequency of interactions logged, and total time spent engaging with the app.
6. Engagement-based segmentation: studies that segmented users based on engagement data (eg, "of the users who logged in at least five times…") were assigned this code.
7. Application of engagement data (descriptive or inferential): a "descriptive" code was assigned to studies that presented, described, or summarized engagement data. An "inferential" code was assigned to studies that used engagement data to predict the intended outcome. Outcome types were coded for studies that applied engagement data inferentially.
8. Attrition type (dropout or nonusage) and statistical method of analysis: dropout attrition is the phenomenon of users not returning to complete follow-up study activities. Nonusage attrition is the phenomenon of users losing interest in a digital health intervention and ceasing to use it [10].
A data extraction form was developed by the first author to extract relevant study information. We referenced work by Sieverink [17] and Kelders [31] on analytic indicators of adherence to eHealth technologies to establish preliminary codes. The form was piloted on a sample of included articles to validate proposed codes and add emergent codes. The codes extracted from each study are presented in Textbox 1. All study data were entered into SPSS version 24 (IBM) [32]. Each study along with its corresponding data was treated as a separate case. We categorized studies according to app structure and application of engagement data and calculated descriptive data for each category. Chi-square and Fisher exact tests of independence were applied to calculate differences between coded variables. A Monte Carlo correction was applied when observed counts were below expected counts.

Study Selection
A total of 1873 articles were identified through the database search. Of the 60 full texts screened, 19 were excluded, 8 of which did not include objective, quantifiable measurements using log data analytics. In total, 41 articles comprising 33 studies and 8 protocols met the eligibility criteria and were included for review. Figure 1 presents the Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram of the study selection progress [33].

Methodological Characteristics
The first authors of reviewed studies were affiliated with institutions in the

Analytic Indicators
Across the reviewed studies, engagement was measured using the following 7 analytic indicators in order of prevalence: the number of measures recorded (76%, 31 [46] Chronic pain (n=5) [64] Other (n=11) Toro-Ramos et al [74] a Analytic indicators of engagement used in reviewed studies. b Not applicable.

Number of Measures
Of the analytic indicators identified in this review, the number of measures recorded by users on an app was the most commonly used indicator of engagement with mHealth apps for chronic conditions. Researchers evaluated a range of measures that aligned with their target chronic condition, such as blood glucose [56,60,61,64,73], weight [56,73,74], symptoms [66,68,69], patient-reported outcomes [38,46,52,65,71], diary entries [47,66], and steps [51]. There was some overlap in the types of measures being collected across apps targeting the same chronic conditions, such as the number of blood glucose readings recorded as an indicator of engagement with diabetes apps. Overall, the target chronic condition and functionality of the app under study ultimately determined which measures would be collected and subsequently reported as an analytic indicator of engagement.

Frequency of Interactions
The frequency of interactions logged was the second most prevalent analytic indicator of engagement. Researchers often chose to complement assessing the number of measures recorded on an app with the frequency by which the measures were recorded. Stratifying frequency of interactions by specific date ranges was also common; Davies et al measured the number of users who used a mental health app at least once after 1 week, 4 weeks, and 20 weeks [38]. They also applied within-date range indicators such as the number of users who used the app once, 2 to 3 times, 4 to 6 times, or 6 or more times per week. Some researchers assigned a benchmark number of days to signify engagement, such as Isetta et al who measured the number of users who engaged with an app for sleep apnea on at least 66% of all days in the study [67]. Others assigned significance to a specific day and considered reaching it as an indicator of engagement, such as Jamison et al who measured the number of users who continued to submit daily assessments of their chronic pain after 90 and 180 days [48]. Layering this analytic indicator over other indicators added temporal context to better understand how users were engaging over time.

Number of Features
The range of features accessed by users in an app was frequently measured as an analytic indicator of engagement. Researchers primarily logged (1) the number of features accessed and (2) the number of times each feature was accessed. In their trial of the Veterans Affairs' Comprehensive Assistance for Family Caregivers Program where users were provided with access to a suite of 6 apps for posttraumatic stress disorder (PTSD) self-management, Frisbee et al measured the number of unique apps used in the suite [39]. To better understand user preferences between 2 features of their app for schizophrenia selfmanagement, Ben-Zeev et al measured the number of times users chose the video feature over the written content feature [36]. Our research group proposed exploring whether users would access all the features made available in their app for prostate cancer survivorship care, particularly whether users would enable caregiver permissions or write notes to document changes in their care [71]. Overall, researchers applied this analytic indicator to explore the breadth of app engagement and inform feature popularity and relevance for the target population.

Number of Log-Ins
The number of log-ins or sessions logged by users continues to be a commonly used analytic indicator of engagement. This indicator was often coupled with the frequency of interactions logged to standardize counts. Researchers also frequently measured the number of users who opened an app at least once to segment them from users who had downloaded the app but never logged any subsequent activity. Owen et al made both these associations by measuring the number of sessions logged by users on their PTSD self-management app, as well as the number of users who logged at least one session on the first day, week, and month post download [42]. Researchers used this analytic indicator to reflect the shift from adoption to habituation, with a greater number of log-ins or sessions denoting greater engagement.

Number of Modules
When defining analytic indicators for categorization, we differentiated between unrestricted and restricted data collection. Unrestricted data collection was defined as data that could be entered into an app at a frequency or volume dictated by the user, such as the number of blood glucose readings or medications recorded [64]. Restricted data collection was defined as requiring the user to enter data according to a set frequency or volume, such as a list of assigned articles to be read [74] or challenges to be completed [57]. We coded studies reporting unrestricted data collection as number of measures and coded studies reporting restricted data collection as number of modules. A range of studies measured the number of outcome surveys completed from those assigned [45,68,75]. Others assessed the number of videos watched from a playlist [36,55], educational modules completed [52], or self-care advice accessed [69]. Overall, researchers studying apps with modular content considered module completion to be indicative of engagement and consequently, tracked module progression and completion rates.

Time Spent
The amount of time that users engaged with an app was considered by a subset of researchers to be an analytic indicator of engagement. Researchers measured the time spent on unique sections of an app [66], the time spent on unique pages [56], the length of a unique session [38,42,43,71], the length between unique sessions [72], and the total time spent on an app [62,68,73]. Davies et al also segmented sessions by those that were in the 30-to 60-second range [38]. Measuring time spent engaging with an app helped researchers to distinguish between exploratory and purposeful engagement; a rapid succession of short page views was indicative of scanning through content, whereas prolonged viewing suggested greater intention and interest in content. Overall, this analytic indicator informed defining accurate session duration parameters to track session-based analytics.

Number of Pages
The number of pages accessed by users was logged by researchers to reflect overall patterns of app engagement and discoverability of specific content. Kuhn et al measured the number and content of pages visited by users in their app for PTSD self-management, as did other researchers [38,41,71]. Taki et al combined session analytics with page analytics and measured the number of pages viewed per session in their app for obesity self-management [72]. Owen et al recorded click stream data documenting their users' navigation through page content [42]. Insights gleaned from this analytic indicator provided researchers with a broader understanding of the user journey through an app and drew attention to specific content that might drive engagement.

Conceptual Categories of Analytic Indicators
We sought to conceptually clarify the 7 identified analytic indicators by grouping them according to the 4 categories that constitute the quantitative conceptualization of engagement: amount, duration, breadth, and depth [11,12]. Table 3 presents an overview of the categories, their comprised analytic indicators, and the number of reviewed studies that fall into each category. The focus of most reviewed studies was on the depth (76%, 31/41) and amount of engagement (73%, 30/41). There was less attention on the breadth (49%, 20/41) and duration (27%, 11/41) of engagement. TThese findings suggest that a subset of researchers are either not measuring the breadth and duration of engagement in their mHealth evaluations or underreporting the findings.

Application of Engagement Data
Of the 41 studies included for review, 24 presented, described, or summarized the data generated from applying analytic indicators to measure engagement. The remaining 17 studies used or planned to use these data to infer a relationship between engagement patterns and intended outcomes.

Clinical Outcomes
Over half of all researchers assessed the relationship between engagement and clinical outcomes (53%, 9/17). Toro-Ramos et al measured the number of weeks users engaged with their hypertension self-management app and found that users with sustained usage across 19 weeks experienced significant reductions in systolic blood pressure and weight [74]. In their trial of an app for PTSD self-management, Kuhn et al applied the number of days and weeks users engaged with the app as a predictor variable for changes in PTSD symptoms but did not find a significant relationship [41]. Goyal et al segmented all users who reported 5 or more blood glucose readings a day into a subgroup for secondary analyses and found a significant relationship between increased readings and improved glycated hemoglobin after 6 months [59]. They also identified a significant interaction between users who entered a reading on at least three days a week, and improved daily blood glucose self-monitoring. Overall, there was evidence of predictive validity across reviewed studies, with engagement correlating with improved clinical outcomes. However, the majority of analyses conducted to establish this predictive validity relied on nonexperimental variations in engagement due to nonadherence or implementation infidelity. Future evaluations assessing the relationship between engagement and clinical outcomes should consider alternative trial designs with multiple randomizations to ensure that findings are not biased by confounding [76][77][78].

Engagement Outcomes
Many researchers sought to investigate the effect of engagement behaviors on other engagement outcomes (53%, 9/17). In their study examining engagement with a weight loss app, Serrano et al applied classification and regression tree methods to identify subgroups with unique engagement behaviors [79]. They were able to distinguish highly engaged subgroups by the number of customizations made to the diet and exercise features of the app. Ben-Zeev et al found that participants who engaged with their schizophrenia self-management app for a period of 5 to 6 months also had a higher frequency of interactions and engaged 4.3 days per week on average [37]. Torous et al also characterized engagement for a schizophrenia self-management app through fitting frequency of interaction data to a piecewise power law distribution [44]. They found that future use with the app is directly related to prior app use, suggesting that those who engage with the app more often will have a higher probability of app engagement in the future. In their trial of a caloric-monitoring app for type 2 diabetes self-management, Goh et al applied latent-class growth modeling to delineate 8-week trajectories of app engagement [63]. They were able to identify 3 distinct app trajectories based on the frequency of interactions and also associate patient characteristics with these trajectories. In summary, there were strong predictive relationships between numerous engagement domains. This finding motivates establishing complementary domains across multiple contexts to optimize data triangulation.

Utilization Outcomes
Two studies proposed to evaluate the impact of engagement patterns on health care utilization outcomes (12%, 2/17). Kaplan et al plan to examine the impact of sustained engagement over time with an app for pediatric cystic fibrosis and inflammatory bowel disease self-management on the number of hospitalizations and emergency department visits [68]. However, they anticipate that changes in these outcomes may not be realized in a 6-month intervention period. Our research group is evaluating a prostate cancer survivorship app [71] and aims to investigate the relationship between (1) the number of patient-reported outcome measures completed and (2) the frequency of interactions logged on the number of in-clinic visits for prostate cancer-related concerns. Altogether, the limited sample of reviewed studies suggests that the relationship between engagement and utilization outcomes is underdeveloped and warrants further study.
The Fisher exact test of independence indicated that studies of structured apps were more likely to only report descriptive statistics on engagement data (7/7, P=.04). In addition, most studies that applied inferential statistics also measured the frequency of interactions logged (16/17, P=.014). Most researchers who did not segment users into cohorts based on engagement data only reported descriptive statistics on their engagement data (13/14, P<.001), while researchers who segmented their users into cohorts were more likely to conduct subgroup analyses and infer properties of the larger clinical population (14/19, P<.001). Table 4 provides a descriptive overview of studies applying descriptive or inferential analyses on engagement data.

Principal Findings
In conducting this scoping review, we sought to catalog the range of analytic indicators being used in evaluations of consumer mHealth apps for chronic conditions. We applied Arksey and O'Malley's methods of reporting and provided a descriptive analysis of the extent, nature, and distribution of analytic indicators across 41 studies, as well as a narrative and thematic summary of collected data [27]. The average mHealth evaluation included for review was a two-group pretest-posttest RCT of a hybrid-structured app for mental health selfmanagement, had 103 participants, lasted 5 months, did not provide access to health care provider services, measured 3 analytic indicators of engagement, segmented users based on engagement data, applied engagement data for descriptive analyses, and did not report on attrition.

Analytic Indicators
Our results indicate that researchers are measuring engagement across 7 analytic indicators, specifically: (1) the number of measures recorded, (2) the frequency of interactions logged, (3) the number of features accessed, (4) the number of log-ins or sessions logged, (5) the number of modules or lessons started or completed, (6) time spent engaging with the app, and (7) the number or content of pages accessed. We found that the researchers favored evaluating the number of measures recorded on an app as an indicator of engagement, closely followed by the frequency of interactions logged. We also found that both these indicators were most often used to assess hybrid and unstructured apps; these 2 app structures also made up the majority of apps under review.
We noted that researchers were least likely to measure the number of pages accessed and time spent engaging with the app; the latter indicator was mostly reported descriptively (73%, 8/11). This finding was surprising given the historical popularity of these indicators for measuring engagement with Web-based interventions [17,23,80]. The breadth and duration categories that conceptually comprise these analytic indicators were also deprioritized. We propose that these indicators are falling out of favor because of the growing recognition that users engage differently with apps. Users perceive apps to be a short-term commitment [81] and access app-based content sporadically for shorter periods of time compared with Web-based interventions [82]. Recent research by Morrison et al comparing patterns of engagement with a stress management intervention delivered via website versus app mitigated these differences by significantly reducing the number of pages on the app version of the intervention compared with the website [83]. They subsequently found that app users logged in twice as often but spent half as much time engaging compared with website users. They did not report the number of pages accessed or time spent engaging with the app as indicators of engagement. This body of research, in conjunction with our own findings, suggests that researchers evaluating mHealth apps for self-managing chronic conditions should refrain from measuring and reporting these 2 analytic indicators of engagement unless they are expressly relevant to the app under study.
Our identification of the number of measures recorded on an app as an analytic indicator of engagement deviates from previous research by Sieverink et al on usage and adherence to eHealth interventions [17], which found no evidence that researchers were operationalizing constructs in this way. Our focus on reviewing studies of mHealth apps for self-managing chronic conditions may explain this finding, as these interventions encourage users to systematically record data and capture the variability of their disease state over time [84]. In thinking of the frequency of interactions logged as a common analytic indicator of engagement, we note that there has been a shift toward on-demand apps with features and functionality that users can engage with at their own discretion. Benchmarking engagement by time range provides more context on a user's intentions and needs than just the total amount of engagement.
We did not observe any significant differences between the number or type of analytic indicators used to measure engagement across chronic conditions. Researchers applied indicators that were relevant to the features and functionality of their app. For example, studies of apps for diabetes self-management often measured the number of blood glucose readings due to the popularity of this feature but never measured the number of modules or lessons because these features were not offered to users. In a recent review on the barriers and facilitators of engagement with remote measurement technology for managing health, Simblett et al found that studies were reporting idiosyncratic measures of engagement and adherence that were not comparable across studies [26]. Their findings align with our own, and support Yardley et al's assertion that effective engagement is defined in relation to the purpose of a specific intervention and can only be established empirically in the context of that intervention [14]. Although Simblett et al call for less variation in how engagement is quantitatively measured across studies, we propose that researchers continue to apply context-specific analytic indicators but report them more systematically to enable cross-study comparison. Researchers might consider categorizing indicators according to the 7 domains identified in this research and providing detailed specifications on the analytic tags required to implement each indicator. When reporting on indicators, researchers should specify that they are measuring the construct of engagement and then catalog each domain. This practice may contribute to greater taxonomic consensus by curbing the arbitrary reporting of engagement-related constructs identified in this review.

Application of Engagement Data
Although researchers measured, on average, 3 indicators in a single study, the majority reported findings descriptively and did not further investigate how engagement with an app contributed to its impact on health and well-being. This finding suggests that researchers are gaining nuanced insights into how users are engaging with their apps but are not conducting inferential analyses to characterize effective engagement for improved outcomes. Relating analytic engagement patterns to behavior change and intended outcomes has been advocated across the behavioral and computational sciences [14,15,24,85,86], with recent efforts made to equip researchers with strategies for performing inferential analyses on engagement data [22,87,88]. Our analyses indicated that studies of structured apps were more likely to only report descriptive statistics on engagement data. Given that structured apps primarily require users to follow a predetermined engagement pathway and complete a series of milestones, it is reasonable for researchers to report on completion rates and identify drop-off points. However, it may be helpful to conduct inferential analyses to understand if completion of an app-mediated program is required to achieve intended outcomes, or whether users may derive proportional benefits from progressing through stages of the program. Of the studies that applied inferential statistics, most measured the number of days, week, or months users engaged with an app. This finding suggests that researchers consider a temporal understanding of engagement to be important in determining a predictive effect on intended outcomes.

Recommendations
In their systematic review, Sieverink et al found that over half of all reviewed studies measured adherence to eHealth interventions using a single analytic indicator, and a quarter used 2 indicators [17]. The authors conclude that a limited but deliberate set of only one of 2 different indicators in accordance with the goal of the technology is sufficient to operationalize adherence. On reviewing how researchers were operationalizing adherence, they found that the majority reported adherence only in terms of how an intervention was used. The absence of a comparison to a threshold for intended use renders this operationalization incongruent with the definition of adherence. Instead, we propose that it aligns with the current understanding of engagement, which is more exploratory in nature and thus supports applying a greater number of analytic indicators.
In contrast to Sieverink et al's findings, the majority of our reviewed studies applied between 2 and 4 analytic indicators to measure engagement. This variance suggests that researchers are starting to recognize a conceptual and methodological distinction between the constructs of engagement and adherence. From these findings, we make the following recommendation: researchers seeking to gain a preliminary understanding of how users are engaging with their app are encouraged to apply all relevant analytic indicators from those identified in this review. Multimedia Appendix 2 presents data that may support researchers to select indicators that have previously been measured for their target chronic condition or for an app with similar features and functionality. Upon generation of analytic findings, researchers might consider segmenting users by engagement behaviors to interrogate the data and refine their engagement models. Conducting inferential subgroup analyses with engagement as a predictor of observed health outcomes might uncover potential patterns of effective engagement and inform an operationalization of intended use. In this way, measuring engagement can be positioned on a methodological continuum toward determining adherence. Figure 2 presents a process model of our recommendations.
During our full-text review, we excluded a large number of studies because they did not include objective, quantifiable measurements using log data analytics. Some studies had users self-report their engagement, whereas others omitted reporting engagement altogether and solely related findings on app efficacy. One possible explanation for this gap might be that researchers are unfamiliar with how to derive analytic insights from their app. From our experience, the process of tagging interaction data to enable analytic insights requires deliberate foresight. A shared understanding between a researcher and a software developer of the research questions being answered is critical to determine how analytics data should be modeled. Multimedia Appendix 3 presents a use case for applying analytic tags to evaluate effective engagement.
Our final recommendation concerns the reporting of attrition in data-driven mHealth evaluations. In 2005, Eysenbach published landmark work on the law of attrition [10], which was his observation that a substantial portion of participants in eHealth trials stop using the intervention before study end. He posits that attrition is a fundamental characteristic and methodological challenge in the evaluation of eHealth interventions and recommends that "usage metrics and determinants of attrition should be highlighted, measured, analyzed, and discussed" [10]. Our findings suggest that this counsel has not fully translated into practice in the mHealth field. There is less inclination to log and report on analytic indicators of disengagement. We encourage researchers to attribute the same value to attrition data as they currently do to engagement data, as both constructs provide consequential insights into the viability of an app in the real world.

Limitations
Some methodological limitations of our scoping review warrant discussion, the most significant being that we only reviewed articles published over a 2-year period. This sampling frame may not have captured a representative sample of mHealth literature. As such, we may have missed relevant studies published before November 2015 and after November 2017 that would have met our eligibility criteria. While we acknowledge that our sampling frame is limited in scoping the entire field of mHealth, we believe it captures the application of analytics within the field of mHealth. From our review of the literature before conducting our search, we identified a paucity of papers that focused on mHealth log data analyses. The systematic review on usage-based adherence to eHealth interventions conducted by Sieverink et al reviewed 62 papers, of which 7 were on smartphone-based interventions [17]. Of those 7 papers, 5 were published after 2016, and the other 2 were both published in 2013. Perski et al conducted a systematic review on engagement with digital behavior change interventions that comprised all studies up to November 2015 [11]. They reviewed 113 studies, of which 13 were on mobile phone-based interventions. Only 4 of those studies applied log data analyses to study engagement with the intervention. These insights confirm that our scoping review did not include all studies that applied log data analyses to study engagement with mHealth apps. However, they also suggest that the number of studies we omitted is small. Our sampling frame of November 2015 to November 2017 directly follows Perski et al's review and includes 41 studies to address our specific research questions. For these reasons, we posit that our sample is sufficiently robust to provide a representative understanding of how analytics are being applied to study engagement with mHealth apps. Due to limited resources, only 1 reviewer conducted the electronic searches and screened all titles and abstracts against eligibility criteria, thereby potentially introducing bias. We did not assess the quality of included articles; however, this is in line with our review framework, which does not mandate this methodological practice. Finally, we did not map analytic indicators to the 14 identified engagement-related constructs for analysis. We acknowledge that conceptual differences exist between some of these constructs (eg, usage, feasibility, and adherence), and it is possible to use multiple constructs in the same study. However, we reviewed each construct and its analytic operationalizations separately during our data extraction process and could not discern significant differences. As such, we feel that we have included a homogenous body of research in this review and provided accurate insights into how researchers have used analytic indicators to measure engagement.

Conclusions
To date, the potential for mHealth apps to positively impact chronic health outcomes has not yet been realized [89]. This is, in part, due to the difficulties of generating a solid evidence base to guide clinical, policy, and regulatory decision making [90]. Indeed, the mHealth field has been reproached for arguing that apps warrant digital exceptionalism given the iterative nature of their design and the prohibitive cost of trials compared with their perceived level of risk [91]. We propose that our review supports researchers to harness these natural attributes for conducting data-driven evaluations of digitally mediated behavior change. Without objective knowledge of how users engage with an app to care for themselves, the mechanisms of action that underlie complex models of digitally mediated behavior change cannot be identified.
Our proposed library of analytic indicators to evaluate effective engagement with consumer mHealth apps for chronic conditions may be of value to researchers as a resource to support their evaluative practice. Researchers can systematically incorporate these analytic indicators into their study measures by adding analytic tags to their app's source code, allowing them to measure engagement without creating user burden or reactivity. Once generated, these data can be used in inferential analyses to delineate relationships with observed health outcomes. Researchers can further interrogate these data by conducting rapid cycles of research and development to validate hypothesized models of effective engagement. On the basis of these insights, researchers can (1) build a cumulative body of evidence for how users should engage with their app to achieve intended outcomes, (2) incrementally improve their app to optimize effective engagement, and (3) determine the optimal digital dose of effective engagement with their app for validation in a definitive trial to meet required levels of evidence for procurement and distribution [92]. Successful implementation of these practices may elevate the discourse of these apps beyond the coarse evaluations and monolithic policy recommendations against their value in health care.
Raising the standard of mHealth app efficacy through measuring analytic indicators of engagement may enable greater confidence in the causal impact of apps on improved chronic health and well-being. It is this opportunity afforded by data-driven research to close the gap between promised and realized health benefits that is most meaningful.