Implementation Outcome Scales for Digital Mental Health (iOSDMH): Scale Development and Cross-sectional Study

Background Digital mental health interventions are being used more than ever for the prevention and treatment of psychological problems. Optimizing the implementation aspects of digital mental health is essential to deliver the program to populations in need, but there is a lack of validated implementation outcome measures for digital mental health interventions. Objective The primary aim of this study is to develop implementation outcome scales of digital mental health for different levels of stakeholders involved in the implementation process: users, providers, and managers or policy makers. The secondary aim is to validate the developed scale for users. Methods We developed English and Japanese versions of the implementation outcome scales for digital mental health (iOSDMH) based on the literature review and panel discussions with experts in implementation research and web-based psychotherapy. The study developed acceptability, appropriateness, feasibility, satisfaction, and harm as the outcome measures for users, providers, and managers or policy makers. We conducted evidence-based interventions via the internet using UTSMeD, a website for mental health information (N=200). Exploratory factor analysis (EFA) was conducted to assess the structural validity of the iOSDMH for users. Satisfaction, which consisted of a single item, was not included in the EFA. Results The iOSDMH was developed for users, providers, and managers or policy makers. The iOSDMH contains 19 items for users, 11 items for providers, and 14 items for managers or policy makers. Cronbach α coefficients indicated intermediate internal consistency for acceptability (α=.665) but high consistency for appropriateness (α=.776), feasibility (α=.832), and harm (α=.777) of the iOSDMH for users. EFA revealed 3-factor structures, indicating acceptability and appropriateness as close concepts. Despite the similarity between these 2 concepts, we inferred that acceptability and appropriateness should be used as different factors, following previous studies. Conclusions We developed iOSDMH for users, providers, and managers. Psychometric assessment of the scales for users demonstrated acceptable reliability and validity. Evaluating the components of digital mental health implementation is a major step forward in implementation science.


Introduction
Background Due to rapid advances in technology, mental health interventions delivered using digital and telecommunication technologies have become an alternative to face-to-face interventions. Digital mental health interventions vary from teleconsultation with specialists (eg, physicians, nurses, psychotherapists) to fully or partially automated programs led by web-based systems or artificial intelligence [1,2]. For example, internet-based cognitive behavioral therapy has been found useful for improving depression, anxiety disorders, and other psychiatric conditions [3][4][5]. Moreover, a recent meta-analysis suggested that internet-based interventions were effective in preventing the onset of depression among individuals with subthreshold depression, indicating future implications for community prevention [6]. Past studies have demonstrated that mental health interventions are suitable for digital platforms because of several reasons: rare need for laboratory testing of patients, chronic shortage of human resources in the field of mental health, and stigma often experienced by patients in consulting mental health professionals [7].
Although numerous studies have demonstrated the efficacy of digital mental health interventions, many people do not benefit from them mainly due to insufficient implementation. Implementation is defined as "a specified set of activities designed to put into practice a policy or intervention of known dimensions" [8]. The entire care cascade can benefit from optimization. People with mental health problems are known to face psychological obstacles to treatment [9] due to lack of motivation [9,10], lower mental literacy [11], or stigma [12]. Moreover, digital mental health interventions face high attrition and low adherence to programs especially in open-access websites [13][14][15]. This may be because implementation aspects have not been fully examined when the interventions are being developed. One of the major barriers is the lack of reliable and valid process measures. Validated measures are needed to monitor and evaluate implementation efforts. Core implementation outcomes include acceptability, appropriateness, feasibility, adoption, penetration, cost, fidelity, and sustainability [16,17]. However, most of these measures have not yet been validated. Weiner et al have developed validated scales for acceptability, appropriateness, and feasibility [18], but these scales were not designed for digital mental health settings. A systematic review of implementation outcomes in mental health settings reported that most outcomes focused on acceptability, and other constructs were underdeveloped without psychometric assessment [19].
Moreover, implementation involves not only the patients targeted by an intervention but also individuals or groups responsible for program management, including health care providers, policy makers, and community-based organizations [8]. Providers have direct contact with users. Managers or policy makers have the authority to decide on the implementation of these programs.

Objectives
To our best knowledge, outcome measurements to evaluate implementation aspects concerning users, providers, and managers or policy makers are not available in digital mental health research. Therefore, the primary aim of this study is to develop new implementation outcome scales for digital mental health (iOSDMH) interventions that can be applied for users, providers, and managers or policy makers. The secondary aim is to validate the implementation scale for users. This study does not include validation of the implementation scale for providers and managers because the study does not involve providers and managers.

Study Design
We originally developed the English and Japanese versions of the iOSDMH based on previously published literature [18,19], which proposed the 3 measures of the implementation outcome scale and provided a systematic review of implementation outcomes. The development of iOSDMH consisted of 3 phases. In the first phase, literature review on implementation scales was conducted, and scales with high scores for evidence-based criteria were selected for further review. Each item from the item pool was critically reviewed by 3 researchers, and they discussed whether the items were relevant for digital mental health. Based on the selected items, the team developed the first drafts of the scales for users, providers, and managers or policy makers. In the second phase, the draft of the iOSDMH was carefully examined by 2 implementation researchers and 1 mental health researcher. With these expert panels, the research team discussed the relevance of the selected items in each category as well as the wording of each question and created the second drafts of the scales. In the third phase, the draft of the iOSDMH was presented to the implementation and digital mental health researchers to confirm the scales and further changes were made based on their inputs. After confirming the relevance of the scales with the expert panels, we conducted an internet-based survey to examine the scale properties of the Japanese version of the iOSDMH for users. Although the iOSDMH targeted 3 categories of implementation stakeholders, namely users, providers, and managers or policy makers [8], tool validation was conducted for users only, as the study did not involve providers and managers.

Ethical Considerations
This study was approved by The Research Ethics Committee of the Graduate School of Medicine/Faculty of Medicine, University of Tokyo (No. 2019361NI). The aims and procedures of the study were explained on the web page before participants answered the questionnaire. Responses to the questionnaire were considered as the consent to participate.

Development Process of iOSDMH
The development of the iOSDMH consisted of 3 phases. In the first phase, 3 of the investigators (EO, NS, and DN) reviewed 89 implementation scales from previous literature and a systematic review of implementation outcomes [18,19]. After the review, we selected 9 implementation scales (171 items) that were rated with evidence-based criteria in the following categories: acceptability of the intervention process, acceptability of the implementation process, adoption, cost, feasibility, penetration, and sustainability. Each item was reviewed carefully by 3 researchers, and 4 highly scored instruments in terms of psychometric and pragmatic quality were selected [20][21][22][23]. The following concepts were considered relevant in measuring implementation aspects of digital mental health interventions. Moore et al [21] developed the assessment tool for adoption of technology interventions. Whittingham et al [22] evaluated the acceptability of the parent training program. Hides et al [20] reported the feasibility and acceptability of mental health training for alcohol and drug use. Yetter [23] reported the acceptability of psychotherapeutic intervention in schools. Relevant items were adapted for the web-based mental health interventions, and those not relevant in the context of digital mental health were excluded.
The iOSDMH consisted of two parts: (1) evaluations and (2) adverse events of using digital mental health programs.
In the second phase, the drafts of the iOSDMH for users, providers, and managers were reviewed by experts on web-based psychotherapy (KI) and implementation science (MK and RV), and a consensus was reached to categorize all items into the concepts of acceptability, appropriateness, and feasibility for evaluation. We primarily had 22 items for evaluating the use of digital mental health programs and 6 adverse events of the program for users. We narrowed these to 14 items for evaluations and 5 items for adverse events following discussions with expert panels. For the iOSDMH of providers, we first had 14 items for evaluations and 1 item for adverse events; we then selected 10 items for evaluations and 1 item for adverse events. For the iOSDMH of managers, we first had 11 items for evaluations and 1 item for adverse events but changed them to 13 items for evaluations and 1 item for adverse events. Acceptability is the perception that a given practice is agreeable or palatable, such as feeling "I like it." Wordings of the items on acceptability (Items 1, 2, and 3 for users, and Item 2 for managers) were taken from Moore et al [21]. Item 3 for users and Items 1, 3, and 4 for managers were from Whittingham et al [22]. The wording of Item 4 for providers was from Yetter et al [23]. Appropriateness is the perceived fit, relevance, or compatibility, such as feeling "I think it is right to do." Wordings of Item 5 for users, and Items 5 and 7 for managers were from Moore et al [21]. The wording of Item 8 for providers was based on Hides et al [20]. Items 4, 6, and 7 for users and Item 6 for managers were originally developed based on discussions. Item 9 for providers and Item 8 for managers were worded according to Whittingham et al [22]. Feasibility is the extent to which a practice can be successfully implemented [17]. Wordings of Items 8 and 9 for users, Item 7 for providers, and Item 12 for managers were from Moore et al [21]. Items 10, 11, and 13 for users and Item 8 for providers were from Hide et al [20]. Items 12 and 14 for users, Item 9 for providers, and Items 9, 10 and 11 for managers were originally developed based on discussions. In addition to the 3 concepts, we added 1 item related to overall satisfaction in the evaluation section because overall satisfaction is considered important in implementation processes [17]. Previous literature distinguished between satisfaction and acceptability, with acceptability being a more specific concept referring to a certain intervention and satisfaction usually representing general experience [16]. However, we considered that overall satisfaction was an important client outcome of process measures. The second part involved harm (ie, adverse effects of interventions). Burdens and adverse events in using digital programs should be considered because digital mental health interventions are not harm free [24].
In the final step, the second drafts of the iOSDMH for users, providers, and managers were reviewed by 2 external researchers (PC and TS), 1 digital mental health researcher, and 1 implementation researcher, and corrections were made based on discussions. We recognized that the relevance of some items differed according to cultural contexts of responders. For example, Item 2 on acceptability for users, Item 3 on acceptability for providers, and Item 2 on acceptability for managers asked whether using the program would improve their social image, or their evaluation of themselves or their organizations. Improving social image may be important and beneficial in some cultural groups but not as much in others. Researchers of 3 different countries considered these items to be relevant, and therefore, we preserved these items. All coauthors engaged in a series of discussions until a consensus was reached on whether the items reflected the appropriate concepts, as well as the overall comprehensiveness and relevance of the scale. None of the objective criteria was adopted in the process of reaching consensus.
The iOSDMH was developed for targeting 3 different groups that are involved in the implementation process: users (ie, patients), providers, and managers or policy makers. Providers are people who have direct contact with users (eg, medical: nurse; workplace: person in charge). Managers or policy makers are people who have authority to decide on the implementation of this program (eg, responsible person). These scales did not restrict the study settings (eg, clinic workplace, and school). For example, the implementation of workplace-based interventions may involve workers (users), human resource staff (providers), and company owners (policy makers). Moreover, these scales aimed to evaluate the implementation aspects related to users, providers, and managers after the users completed or at least partially received the internet-based intervention. Most items were developed assuming that the users had prior experience in receiving the internet-based intervention. The process of developing the iOSDMH is shown in Figure 1.

Internet-Based Survey
Participants were recruited on an internet-based crowd working system (CrowdWorks, Inc), which has more than 2 million registered workers. The criterion for eligibility was to be over 20 years old. Participants were required to learn from the self-help information website UTSMeD [25], a digital mental health intervention. The UTSMeD website was developed to help Japanese general workers cope with stress and depression. It contains self-learning psychoeducational information on mental health (eg, stress management). This web-based UTSMeD intervention has proven effective in reducing depressive symptoms and improving work engagement among Japanese workers in previous randomized controlled trials [26,27]. In our study, participants were asked to explore the UTSMed website for as long as they liked and take quizzes on mental health. They answered the Japanese version of the iOSDMH for users (14 items in 2 pages) after they received acceptable scores (ie, 8 or more of 10 questions answered correctly) in the quizzes. The participants received web-based points as incentives for participation. As the current UTSMeD is an open-access website and authors directly provided the URL to participants, the study did not involve any providers, managers, or policy makers. The psychometric assessment thus was limited to users. Gender, age, marital status, education attainment, income, work status, occupation type, and employment contract constituted the demographic information. The target sample size was determined as 10 times the number of items needed to obtain reliable results (eg, 200 participants). The survey was conducted through the internet-based crowd working system. Completed answers were obtained without missing variables.

Statistical Analysis
To assess the internal consistency of the Japanese iOSDMH, Cronbach α coefficients were calculated for all scales and each of the 4 subscales (acceptability, appropriateness, feasibility, and harm). To assess structural validity, exploratory factor analysis (EFA) was conducted because previous studies have shown that acceptability and appropriateness are conceptually similar [16,18]. EFA was conducted by excluding 1 item of overall satisfaction, as the concept of satisfaction cannot be applied to each of the 4 subscales. We extracted factors with eigenvalues of more than 1, following the Kaiser-Guttman "eigenvalues greater than one" criterion [28], using the least-squares method with Promax rotation. Items with factor loadings above 0.4 were retained [29].
Statistical significance was defined as P<.05. All statistical analyses were performed using the Japanese version of SPSS 26.0 (IBM Corp).

Development of iOSDMH
The final version of the iOSDMH for users contained 3 items for acceptability based on Moore and Whittingham [21,22] and 4 items for appropriateness, 1 of which was based on Moore [21]. The others were original; there were 6 items for feasibility, 5 of which were based on Moore and Hide [20,21], and 1 item was original; we developed 5 original items for harm and 1 for overall satisfaction. The iOSDMH states "Please read the following statements and select ONE option that most describes your opinion about the program." The response to each item was scored on a 4-point Likert-type scale ranging from 1 (disagree) to 4 (agree). The iOSDMH for providers and managers or policy makers has an option 5 (don't know). Details are provided in Multimedia Appendix 1.
The final version of the iOSDMH for providers contained 3 items for acceptability, 2 of which were based on Yetter [23], and 1 item was original; 3 items for appropriateness, 2 of which were based on Yetter [23], and 1 item was original; 3 items for feasibility, 1 of which was original and 2 were based on Moore [21] and Hides [20]; 1 original item for harm; and 1 for overall satisfaction. For acceptability, Item 1 evaluated the providers' perceived acceptance of the program for protecting the mental health of its users, whereas Items 2 and 3 focused on their own acceptability to implement the program in their workplace. For appropriateness, Items 4 and 6 asked about the providers' perceived appropriateness of the program for users, whereas Item 5 asked about the appropriateness of the program considering the situation of the providers. For feasibility, Item 7 evaluated the providers' perception of the program's feasibility for users, and Items 8 and 9 focused on the willingness of providers to provide the program to users.
The final version of the iOSDMH for managers or policy makers contained 4 items for acceptability, 3 of which were based on Whittingham [22] and the other on Moore [21]; 4 items were for appropriateness, 2 of which were based on Moore [21]; 1 item was based on Whittingham [22], and the other one was original; there were 4 items for feasibility, 1 of which was based on Moore [21] and the others were original; we had 1 original item for harm and 1 item for overall satisfaction. Similar to the iOSDMH for providers, each factor of the scale contained questions on managers' perceptions on implementation in terms of the conditions of users and providers, as well as the managers themselves. For example, Items 1 and 2 asked about the acceptability of the program for the institution, whereas Item 3 focused on managers' perceived acceptability for providers, and Item 4 evaluated managers' perceived acceptability for users. For appropriateness, Items 5 and 7 focused on the appropriateness of the program for the institution, and Item 8 assessed the appropriateness of the program for users according to managers' perceptions. For feasibility, Items 9 and 10 examined the feasibility of the program for the institution as perceived by managers or policy makers. Item 11 evaluated managers' perceived feasibility for providers, and Item 12 evaluated managers' perception of feasibility for users.

Factor Structure of iOSDMH
The EFA results are shown in Table 3. EFA conducted according to the Kaiser-Guttman criterion yielded 3 factors. The first factors were acceptability and appropriateness. The second was feasibility, and the third was harm. All items showed factor loadings above 0.4, so we kept them intact.

Principal Findings
This study developed implementation outcome scales for digital mental health based on existing literature and reviews by experts on web-based psychotherapy and implementation science. Our measurements included 3 key constructs of the implementation outcomes (acceptability, appropriateness, and feasibility) from previous studies and additional constructs on harm and satisfaction considered necessary in the implementation process. Implementation researchers and mental health experts agreed that each instrument of the implementation measures reflected the correct concepts.
This study created implementation outcomes for people involved in the implementation process: users, providers, and managers or policy makers. According to the World Health Organization's implementation research guide, knowledge exchange or collaborative problem-solving should occur among stakeholders such as providers, managers or policy makers, and researchers [8]. A past study indicated that policy makers and primary stakeholders had decision frameworks that would produce different implementation outcomes [30]. Previous implementation outcome research targeted 1 or 2 groups of users, providers, and managers or policy makers. However, to our knowledge, few studies have resulted in outcome scales for different levels of stakeholders [19]. We believed that outcome measures should be adjusted for stakeholders, as decision frameworks may differ among them. For example, users of the program judge its appropriateness by considering whether it is suitable for their situations. Nevertheless, providers may find the program suitable for the circumstances of their users and for themselves. Managers or policy makers will care if the program is suitable for themselves and for users and providers. Another example is that although the length or frequency of the program may be important for feasibility among users, the cost or institutional resources may be important in assessing feasibility among managers or policy makers.
Psychometric assessment of the implementation outcomes showed good internal consistency for appropriateness, feasibility, and harm. Internal consistency for acceptability was lower than that for other constructs (α=.665), possibly because the construct for acceptability consisted of only 3 items. The EFA model suggested a 3-factor solution. The first factors were acceptability and appropriateness. Correlations between these 2 concepts were high. Our finding was consistent with previous studies in that acceptability and appropriateness were conceptually close [16,18]. For instance, it has been reported that perceived acceptability of treatment is shaped by factors such as appropriateness, suitability, convenience, and effectiveness [31,32]. However, other scholars agree that acceptability should be distinguished from appropriateness.
Proctor et al stated that an individual (ie, end user) may think that an intervention that seems appropriate may not always be acceptable and vice versa [16]. Similarly, previous research on alcohol screening in emergency departments revealed that nurses and physicians found alcohol screening to be acceptable but not appropriate because the process was time-consuming, the patients might object to it, and the nurses had not received sufficient training [33,34]. Therefore, it is essential to distinguish acceptability from appropriateness in such a situation because it helps focus on the appropriate concept during implementation. Therefore, we decided to maintain the 4-factor questionnaire comprising acceptability, appropriateness, feasibility, and harm.
The strength of this study was that we selected the concepts that seemed relevant to implementation research based on literature, modified them for electronic mental health settings, and improved the contents based on discussions with expert panels. Moreover, this study developed each questionnaire for users, providers, and managers or policy makers, all having an essential role in the implementation [8]. Evaluating the implementation outcomes of different stakeholders will clarify different perceptions of the intervention program, possibly leading to active knowledge exchange among users, providers, and consumers. Although our outcome measures need further evaluation, our study contributes to implementation research in digital mental health.
We acknowledge the following limitations of our study. First, it was vulnerable to selection bias. As we recruited participants via the internet for the psychometric validation study, they might not be representative of the general population in Japan. It is possible that the participants were more familiar with web-based programs, and they may have had a better understanding of digital mental health programs. In addition, this study conducted psychometric assessment for the outcome scales for users only because the intervention setting in which interested individuals enrolled themselves in the program did not involve any providers or managers. In a study setting involving providers and managers or policy makers, the iOSDMH for providers and managers or policy makers will be needed to evaluate implementation outcomes. We plan to evaluate the iOSDMH for providers and managers or policy makers in our future intervention study (UMIN-CTR: ID UMIN 000036864). Another limitation is that criterion-related validity was not evaluated in the current psychometric assessment. The development process of the items may not be regarded as a theoretical approach. Future studies should evaluate criterion-related validity using other measures related to implementation concepts, such as the system usability scale or participation status of web-based programs. This study validated the Japanese version of the iOSDMH for users. Additional studies are needed for validating the English version. In future studies, we plan to apply these outcome measures in several web-based intervention trials to assess whether these implementation outcomes will predict the completion rate and participant attitude using digital access log information [35]. Although we tried to include multiple researchers in the digital mental health and implementation science domains from different countries, the iOSDMH scales would become more robust with a larger and more diverse review team. Finally, the setting in which we conducted the survey was an occupational setting (ie, for workers). Future studies should evaluate the scales in other settings (eg, clinical, school).

Conclusions
We developed implementation outcome scales for digital mental health interventions to assess the perceived outcomes for users, providers, and managers or policy makers. Psychometric assessment of the outcome scale for users showed acceptable reliability and validity. Future studies should apply the newly developed measures to assess the implementation status of the digital mental health program among different stakeholders and enhance collaborative problem-solving.