Effectiveness of an Immersive Telemedicine Platform for Delivering Diabetes Medical Group Visits for African American, Black and Hispanic, or Latina Women With Uncontrolled Diabetes: The Women in Control 2.0 Noninferiority Randomized Clinical Trial

Background Medically underserved people with type 2 diabetes mellitus face limited access to group-based diabetes care, placing them at risk for poor disease control and complications. Immersive technology and telemedicine solutions could bridge this gap. Objective The purpose of this study was to compare the effectiveness of diabetes medical group visits (DMGVs) delivered in an immersive telemedicine platform versus an in-person (IP) setting and establish the noninferiority of the technology-enabled approach for changes in hemoglobin A1c (HbA1c) and physical activity (measured in metabolic equivalent of task [MET]) at 6 months. Methods This study is a noninferiority randomized controlled trial conducted from February 2017 to December 2019 at an urban safety net health system and community health center. We enrolled adult women (aged ≥18 years) who self-reported African American or Black race or Hispanic or Latina ethnicity and had type 2 diabetes mellitus and HbA1c ≥8%. Participants attended 8 weekly DMGVs, which included diabetes self-management education, peer support, and clinician counseling using a culturally adapted curriculum in English or Spanish. In-person participants convened in clinical settings, while virtual world (VW) participants met remotely via an avatar-driven, 3D VW linked to video teleconferencing. Follow-up occurred 6 months post enrollment. Primary outcomes were mean changes in HbA1c and physical activity at 6 months, with noninferiority margins of 0.7% and 12 MET-hours, respectively. Secondary outcomes included changes in diabetes distress and depressive symptoms. Results Of 309 female participants (mean age 55, SD 10.6 years; n=195, 63% African American or Black; n=105, 34% Hispanic or Latina; n=151 IP; and n=158 in VW), 207 (67%) met per-protocol criteria. In the intention-to-treat analysis, we confirmed noninferiority for primary outcomes. We found similar improvements in mean HbA1c by group at 6 months (IP: –0.8%, SD 1.9%; VW: –0.5%, SD 1.8%; mean difference 0.3, 97.5% CI –∞ to 0.3; P<.001). However, there were no detectable improvements in physical activity (IP: –6.5, SD 43.6; VW: –9.6, SD 44.8 MET-hours; mean difference –3.1, 97.5% CI –6.9 to ∞; P=.02). The proportion of participants with significant diabetes distress and depressive symptoms at 6 months decreased in both groups. Conclusions In this noninferiority randomized controlled trial, immersive telemedicine was a noninferior platform for delivering diabetes care, eliciting comparable glycemic control improvement, and enhancing patient engagement, compared to IP DMGVs. Trial Registration ClinicalTrials.gov NCT02726425; https://clinicaltrials.gov/ct2/show/NCT02726425


Introduction
Minority and low-income women with type 2 diabetes mellitus (T2DM) face widening disparities in diabetes care and clinical outcomes, highlighting the pressing need to improve diabetes care for underserved communities [1][2][3][4][5]. Diabetes medical group visits (DMGVs) are shared appointments where groups of patients receive diabetes self-management education (DSME), peer support, and a clinical visit within a 2-hour appointment. Compared to usual care for adults living with diabetes, the in-person (IP) DMGV model has been associated with improved diabetes outcomes and lower costs [6][7][8][9]. Moreover, receiving care as a group can reduce disparities by fostering more equitable patient-provider relationships, creating relationships of care between patients, and improving health literacy and self-management skills [10]. Yet, many disadvantaged communities report poor access to DMGVs as health systems find them difficult to implement [11,12]. Patient engagement in group-based diabetes care is also often low due to social stigma, lack of transportation, and time constraints [11,13].
Telehealth solutions have gained unprecedented traction with the onset of the COVID-19 pandemic. Early evidence has shown that virtual worlds (VWs) and virtual reality platforms are feasible and potentially more effective alternatives to IP programming [14,15]. A VW is a 3D, computer-based simulated environment where users engage in immersive, experiential learning with animated educational content [16]. Users create avatars, digital manifestations of self, to engage in peer group programming virtually [17]. This environment is intrinsically designed for users to enact behavioral change among peers and restructure old habits [18,19].
To our knowledge, the possibilities of avatar-based VW DSME have not been rigorously tested. We developed an immersive telemedicine platform, linking an interactive VW learning environment with videoconferencing software, to overcome the common barriers to diabetes group-based care while maintaining clinical effectiveness at scale. We implemented Women in Control 2.0 (WIC2) in 2015 to study the comparative effectiveness of delivering DMGVs in a VW versus the traditional IP classroom for women from Black, African American, Hispanic, or Latina backgrounds with uncontrolled T2DM (trial protocol in Multimedia Appendix 1) [20].

Trial Design
From February 2017 to October 2019, we recruited 17 cohorts of African American or Black or Hispanic or Latina women with uncontrolled T2DM. A total of 309 participants were enrolled and randomly assigned to the VW or IP DMGV conditions. Participants attended 8 weekly DMGVs and were followed for 6 months.

Participants
Eligible participants were adult women (≥18 years) who self-identified as African American, Black, Hispanic, or Latina with uncontrolled T2DM, defined by a hemoglobin A 1c (HbA 1c ) value ≥8%. Participants were English-speaking or Spanish-speaking, had telephone access, permanent or stable housing, a clinician-supervised diabetes treatment plan, and could provide informed consent. Exclusion criteria included scheduling conflicts with DMGV programming, enrollment in another program, a history of diabetic ketoacidosis, oxygen-dependent chronic obstructive pulmonary disease, stroke within the last 6 months, and an acute coronary event or chronic heart condition within the last year. Pregnancy, recent glucocorticoid therapy, dialysis, active substance abuse, active cancer treatment, and any medical contraindications to study dietary recommendations were also exclusionary.

Overview
We identified participants from Boston Medical Center and a local community health center using a weekly electronic medical record query. We contacted eligible participants with an introductory letter and follow-up call [21]. Additional recruitment strategies included participant or provider referrals and posted flyers. We screened participants by phone and reconfirmed eligibility at an IP enrollment appointment. Participants provided written informed consent and were eligible for up to US $300 in compensation or a new laptop.

Randomization and Masking
After stratification by language, we used one-to-one block randomization (alternating blocks of 6 and 8) to assign participants to the VW or IP DMGV conditions. A biostatistician generated the randomization sequence, and randomization occurred after informed consent. We randomized participants prior to obtaining baseline data. Investigators were blinded to the randomization process, but assignments were revealed to participants and investigators post consent.

Intervention
Assigned to cohorts of 6-12 participants based on study arm, participants convened in clinical or virtual settings for 8 weekly DMGVs. Each session lasted approximately 120 minutes and started with the completion of an intake form to document acute or chronic symptoms, health system usage, and self-management activities, followed by the measurement of vital signs and the delivery of DSME. Sessions included a one-on-one clinical consult. Study clinicians were 4 board-certified physicians and 2 nurse practitioners. Nonclinical group facilitators received training on core DSME topics and facilitation skills from lead faculty (SEM, PG). All participants received a paper curriculum booklet.
Prior to the first virtual DMGV, staff provided laptops and wireless internet to VW participants and conducted IP computer training. All participants then met weekly for 8 weeks, according to a session schedule. During DMGVs, all participants received the same WIC2 curriculum, which was adapted from Power to Prevent [22] and consisted of 8 modules highlighting topics such as diabetes self-monitoring, preventative care, healthy eating, exercise, and stress management. Three bilingual staff, who were native Spanish speakers and included a certified interpreter, used the forward-backward method of translation, in tandem, to produce a culturally equivalent, Spanish-language curriculum [23,24]. The curriculum content was reviewed by 2 patient advisory groups. To develop avatar-driven learning experiences and incorporate chat and telehealth capabilities, instructional design was adapted for a VW environment using game design theory [25]. VW participants customized avatars to represent themselves and engage in DMGVs, including practicing positive health behaviors such as dance and social support ( Figure 1).
During each session, a clinician met individually with participants (in a separate physical space or via secure telehealth platform or telephone depending on study group) to review blood glucose readings and hyper or hypo glycemic data, conduct diabetes medication reconciliation, and address concerns. Recommendations for medication adjustments were based on an algorithm [26] and shared with primary care providers via progress notes in the electronic health record.
To ensure fidelity of DMGV protocols and standard operating procedures, we used checklists, audits of session recordings, and participant observation field notes.
Following the 8-week DMGV sessions, participants entered a 16-week maintenance period. They were encouraged to self-monitor (tracking blood glucose, blood pressure, diet, and exercise) using a paper booklet or mobile app. No formal DMGVs occurred.

Outcomes
The primary outcomes were mean changes in (1) HbA 1c and (2) physical activity (metabolic equivalent of task [MET]-hours) by study arm from baseline to 6 months. Secondary outcomes included mean changes in HbA 1c , physical activity, and medication changes at 9 weeks after enrollment and changes in

Data Collection and Management
Baseline data collection included sociodemographic characteristics, HbA 1c values (blood draw), physical activity (per accelerometers), and survey measures. Baseline HbA 1c was obtained within 30 days of the first DMGV. Physical activity was measured within a 14-day window, with participants wearing an accelerometer on the wrist for 7 consecutive days. Follow-up data collection occurred 9 weeks and 6 months post enrollment (within a 28-day window). Study data were stored using secure Research Electronic Data Capture (REDCap) software hosted by Boston University [32,33]. Unique study identification numbers were used to label all participant forms.

Sample Size Calculations
We used the average overtime change in HbA 1c and physical activity by study arm as coprimary outcomes, measured from baseline to 6-month follow-up. Data obtained from the WIC 1.0 pilot study was used to estimate the sample size necessary to establish the noninferiority margin of VW DMGVs compared to IP DMGVs at reducing HbA 1c and increasing total physical activity levels [34]. For HbA 1c , we assumed a noninferiority margin of 0.7 based on a clinically meaningful decrease [35,36], a pooled SD of 2, an α of .05, and a power of 80% based on the pilot study results [34]. For physical activity, we assumed a noninferiority margin of 12, a pooled SD of 35, an α of .05, and a power of 80%. We required 106 participants per arm. We did not expect the dropout rate for WIC2 to exceed 7%; thus, we aimed to enroll and randomize 228 and retain 212 participants.

Statistical Analysis
Sociodemographic characteristics were compared by arm using chi-square and Fisher exact tests as appropriate for categorical variables and 2 sample t tests or Wilcoxon rank sum tests for continuous variables. Within-group changes from baseline to follow-up on mean HbA 1c and mean physical activity between the VW and IP study arms were assessed with paired t tests; between-group changes were assessed with multiple linear regression models, both at an α level of .025 after applying a Bonferroni correction for multiple testing. Between-group differences in the likelihood of achieving a 0.4% reduction or more in HbA 1c at follow-up were examined by logistic regression. One-sided P values and 97.5% CIs were calculated to assess the noninferiority hypothesis, using margins of 0.7% for HbA 1c and 12 MET-hours for physical activity. Other statistical tests and confidence intervals were 2-sided. Per-protocol (PP) analyses were limited to participants who completed the protocol as intended, by attending ≥6 out of 8 DMGVs. PP and intention-to-treat (ITT) analyses were conducted on a full data set that used multiple imputation via predictive mean matching to impute missing baseline, 9-week, and 6-month primary and secondary outcomes. Analyses were replicated on unimputed data to check the sensitivity of results to imputation.
Accelerometry data was used to calculate participants' mean change in physical activity behavior from baseline to 6 months. For each participant, we randomly selected the 2 weekdays with the longest wear-time. We considered missing wear time data in a 24-hour day as sedentary activity. For each weekday, we estimated total MET-hours by a weighted sum of the number of hours in light (1.5 MET), moderate (4 MET), vigorous (6 MET), and very vigorous (8 MET) activity as measured by the accelerometer using the Freedson et al cut points [37,38]. We then averaged the estimated MET-hours for the 2 weekdays with the longest wear-time to obtain weekday average MET-hours per participant.
Sensitivity analyses were performed to evaluate the influence of language preference on our primary outcome results. Participant characteristics with, versus without, baseline HbA 1c were assessed to detect potential bias from missing data. Characteristics of participants who adhered to the session protocol (attended ≥6 vs <6 sessions) were also assessed, and primary outcome PP analyses were replicated controlling for characteristics found to be correlated with protocol adherence. All analyses were performed using SAS/STAT software (SAS version 9.4; SAS Institute) or the R programming language (version 3.4.3; R Core Team).

Ethics Approval
This study was conducted according to the CONSORT (Consolidated Standards of Reporting Trials) guidelines [39] and approved by the Boston University/Boston Medical Center Institutional Review Board (H-34220).

Study Population
Of 1960 potentially eligible patients, 1349 were screened, and 309 participants were randomized; 29 patients had a change in eligibility status before the first DMGV ( Figure 2). The PP sample included 207 (108 VW and 99 IP; 67%) participants who met a priori criteria by attending ≥6 DMGVs. At baseline, participants' mean age was 55 (SD 10.6) years, their mean weight was 195 (SD 41.8) lb, and their mean physical activity was 104.1 (SD 34.3) MET-hours. All participants were female, with 63% (195/309) of African American or Black race and 34% (105/309) Hispanic or Latina ethnicity (Table 1). A majority (219/309, 71%) were insured by Medicaid, Medicare, or both, and 59.6% (184/309) had home internet. The mean DD score was 2.3 (SD 1.0), which is moderately high, and the PHQ-8 score for depressive symptoms was 5.5 (SD 5.0), which is mild. More IP participants owned a smartphone. Mean HbA 1c values differed by VW and IP groups (mean 9.7%, SD 1.7% vs mean 10.2%, SD 1.8%), respectively. Participant characteristics with complete versus missing baseline HbA 1c data and changes in eligibility status were compared (Tables S1 and S2 in Multimedia Appendix 2 [27][28][29][30][31]). Because no participant characteristic was identified as accountable for the imbalance in mean baseline HbA 1c , we attributed the difference in baseline HbA 1c to random imbalance and controlled for baseline HbA 1c in our outcome analyses.  d Assessed using the Patient Health Questionnaire-8 (PHQ-8), which ranges from 1 to 8 [29]. e Assessed using the Diabetes Distress Scale-17 [27,28].

Fidelity
The 17 study cohorts were conducted with 98% fidelity to the 8-week curriculum. The median number of sessions attended by participants was 6 in the IP arm and 7 in the VW arm. Among participants who attended WIC2 DMGVs, 98.2% (1618/1648 total events) completed the clinician consult and intake forms. In the VW condition, participants completed the clinical consult via telehealth (540/823, 65.6%), telephone (54/823, 6.6%), or either modality (230/823, 27.9%).

Coprimary Outcomes-ITT
Changes in HbA 1c and physical activity are reported in Tables  2 and 3. In the ITT sample, we found within-group HbA 1c improvements of 0.8% among IP participants, from 10.2% at baseline to 9.4% at 6 months, and 0.5% among VW participants, from 9.7% at baseline to 9.2% at 6 months. Improvements in HbA 1c values were not statistically significant between groups and were noninferior (the mean difference across study arms was 0.3 (97.5% CI -∞ to 0.3); P<.001). The upper limit did not cross the predetermined noninferiority margin of 0.7%. It would require a noninferiority margin of less than 0.3% to fail to reject the null hypothesis of inferiority.   For physical activity, IP and VW participants had mean within-group decreases of 6.5 MET-hours and 9.6 MET-hours, respectively. Between-group differences were not detected from baseline to post intervention. Still, the noninferiority of the VW approach was confirmed (the mean difference across arms was -3.1 MET-hours, 97.5% CI -6.9 to ∞; P=.02). It would require a noninferiority margin of less than 9 MET-hours to fail to reject the null hypothesis of inferiority.

PP Results
Among the 207 participants who attended at least 6 DMGVs, within-group mean HbA 1c values improved by 0.7% among IP participants, from 10.1% at baseline to 9.4% at 6 months, and by 0.5% among VW participants, from 9.6% at baseline to 9.1% at 6 months. Noninferiority was confirmed with a mean difference of 0.2 (97.5% CI -∞ to 0.3; P<.001) in HbA 1c across study arms. Improvements of ≥0.4% were achieved by 56% (56/99) of IP and 52% (56/108) of VW participants from baseline to 6 months, while nearly one-third (75/207, 36.02%) achieved a 1% improvement. IP and VW participants' physical activity decreased, on average, by 5.2 MET-hours and 8.1 MET-hours, respectively. Noninferiority was confirmed in the PP sample (the mean difference across arms was -3.0, 97.5% CI -8.9 to ∞; P=.008). Similar to the ITT analyses, between-group changes in HbA 1c and accelerometer-measured physical activity were not statistically significant.

Secondary Outcomes
We analyzed mean changes in HbA 1c and physical activity data at 9 weeks from baseline, which were similar to the 6-month results (Table S3 in Multimedia Appendix 2). We compared the mean change in DD, depression symptom burden, physical functioning, patient activation, weight, and step count by study arm, and adjusted for the baseline values. We observed substantial within-group improvements in both study arms for total DD and some DD subscales (emotional burden, regimen, and interpersonal). We observed substantive but nonsignificant improvements in both study arms for depression symptom burden, physical functioning, and patient activation, and mixed results for weight (Table 4). Notably, the total participants reporting moderate DD (scores of ≥2) decreased from 53% (156/294) to 33% (77/237), and the proportion with clinically meaningful depressive symptoms (scores of ≥5) decreased from 48% (142/295) to 40% (95/237) from baseline to 6 months (Table S4 in Multimedia Appendix 2). Physical activity assessment using step count revealed high baseline step counts, small decreases at 6 months, and overall null findings ( Table  4). Results of sensitivity and unimputed analyses are in Table  S5-S10 in Multimedia Appendix 2.   ITT: intention-to-treat. g Assessed using the Patient Health Questionnaire-8 (PHQ-8), which ranges from 1 to 8 [29].
h Assessed using the physical function subscale on the Patient-Reported Outcomes Measurement Information System (PROMIS-29) measure [30].

Lifestyle Behaviors
Self-management behaviors were assessed through a weekly self-report to detect changes in diet, exercise, and diabetes-related medication. Nearly a third of all participants (89/309, 28

Adverse Events
One study-related severe adverse event in the VW group occurred due to emotional distress.

Principal Findings
To our knowledge, this is the first fully powered clinical trial to demonstrate the effectiveness of delivering DMGVs using an immersive 3D telemedicine platform versus IP care. Both approaches were similarly effective in reducing mean HbA 1c over 6 months. Our PP sample is indicative of a high patient retention rate. No significant changes in physical activity were detected. Altogether, this research demonstrates that 3D immersive telemedicine DMGVs are an effective alternative to IP group diabetes care for high-risk patients in a safety net health system.
Our preliminary study compared IP versus immersive DSME delivery among 89 low-income African American women with uncontrolled T2DM [34]. Results showed substantial improvements in mean HbA 1c . Other pilot studies demonstrated positive impacts on patient outcomes but faced methodological limitations, including lack of a comparison group, small sample size, and inadequate power, and none used a DMGV format [40,41]. In contrast, the WIC2 study was randomized with an active DMGV control condition and fully powered to rigorously test the primary outcomes.
Nearly half of our participants at baseline had measurable depressive symptoms and diabetes distress. Prior research has revealed a strong correlation between depression, diabetes distress, and uncontrolled diabetes [42,43]. Interestingly, we found that the proportion of WIC2 participants with depressive symptoms (PHQ-8≥5) and diabetes distress (DD≥2) decreased from baseline to 6-month follow-up, indicating the WIC2 intervention improves glucose control and mental health. This finding is important as interventions that address both physical and mental health can reduce patients' treatment burden.
Given our pilot study showed increased physical activity among study participants, the null finding in physical activity in WIC2 was unexpected [34]. We experienced challenges with accelerometry wear due to participants' discomfort with the devices.
A literature review revealed that accelerometry-measured activity for middle-aged women with chronic disease has limitations [44], such that physical activity can be underestimated or inconsistent across the life span [37,45,46]. More research is needed to establish activity assessment guidelines for older adults.

Limitations
We acknowledge several study limitations. We had a small imbalance in HbA 1c at baseline. After careful assessment of participant characteristics, it was determined that this imbalance occurred at random and was unrelated to the fidelity of the study protocol. It is not possible to rule out unobserved confounding of protocol adherence, such as participants' access to transportation, digital literacy, work or childcare conflicts, or financial constraints impacting access to medication. Finally, this study was conducted with women in an urban safety net health system, which may limit its generalizability. The ongoing challenges with access to digital resources and digital literacy for underserved communities may also limit the immediate generalizability of our study findings to similar populations.

Conclusions
Immersive technologies can reduce disparities by improving effectiveness and access to evidence-based diabetes care. We showed that when given the tools, adults from digitally underserved communities robustly adopt health technology tools with improved health outcomes. More effort is warranted to design technology tailored to the needs, capabilities, and life perspectives of diverse communities to avoid leaving behind those most in need of better health care.