Mobile Peer-Support for Opioid Use Disorders: Refinement of an Innovative Machine Learning Tool

Background: The majority of individuals with Opioid Use Disorder (OUD) do not receive any formal substance use treatment. Due to limited engagement and access to traditional treatment, there is increasing evidence that patients with OUDs turn to online social platforms to access peer support and obtain health-related information about addiction and recovery. Interacting with peers before and during recovery is a key component of many evidence-based addiction recovery programs, and may improve self-efficacy and treatment engagement as well as reduce relapse. Commonly-used online social platforms are limited in utility and scalability as an adjunct to addiction treatment; lack effective content moderation (e.g., misinformed advice, maliciousness or “trolling”); and lack common security and ethical safeguards inherent to clinical care. Methods: This present study will develop a novel, artificial-intelligence (AI) enabled, mobile treatment delivery method that fulfills the need for a robust, secure, technology-based peer support platform to support patients with OUD. Forty adults receiving outpatient buprenorphine treatment for OUD will be asked to pilot a smartphone-based mobile peer support application, the “Marigold App”, for a duration of six weeks. The program will use (1) a prospective cohort study to obtain text message content and feasibility metrics, and (2) qualitative interviews to evaluate usability and acceptability of the mobile platform. Anticipated findings and future directions: The Marigold mobile platform will allow patients to access a tailored chat support group 24/7 as a complement to different forms of clinical OUD treatment. Marigold can keep groups safe and constructive by augmenting chats with AI tools capable of understanding the emotional sentiment in messages, automatically “flagging” critical or clinically relevant content. This project will demonstrate the robustness of these AI tools by adapting them to catch OUD-specific “flags” in peer messages while also examining the adoptability of the platform itself within OUD patients.


INTRODUCTION
Adjuncts to treatment for opioid use disorder (OUD) are urgently needed. Opioid overdoses are a leading cause of death for Americans under 50 years old, with recent years recording the most opioid overdose deaths on record [1,2]. The opioid epidemic has not abated despite a recent overall decrease in the number of opioid analgesics prescribed by US providers [3][4][5][6]. Heroin use is increasing for the first time in more than a decade [7][8][9][10][11][12][13], and overdose deaths due to potent synthetic opioids, such as fentanyl and carfentanil, are on the rise [14][15][16][17][18][19][20]. In light of this public health crisis, there is a pressing need for novel approaches to treat patients with OUDs.
Peer support programs may increase efficacy of and retention in OUD treatment. Peer-based interventions have been shown to be an effective component of care provision across multiple health settings and conditions, including addiction [21][22][23][24][25][26]. There is strong evidence to support the role of peer-delivered behavioral interventions for OUD in both clinical and non-clinical settings (e.g., narcotics anonymous) [9].
Novel delivery can improve access to peer support. Technology is already used to augment addiction treatment programs, ranging from automated SMS reminders to self-guided online treatments [27][28][29][30][31]. However, these "self-help" mechanisms do not offer the same advantage of having two-way, peer-to-peer to communication. Text-based peer support can be used to drive engagement and retention in structured treatment programs [28]. Text communications between peers can be done using a pseudonym (with only the moderator or provider being aware of the patient's identity), thereby increasing patients' comfort. The process of typing and rereading one's messages may also improve awareness, akin to therapeutic "homework" such as journaling. Additionally, vulnerable populations, including low-income and lowliteracy patients, have demonstrated a preference for the text-message medium [32,33]. Millions of individuals already use ad-hoc online forums (e.g., Reddit™) to obtain peer support and health information indicating the demand for this type of platform. Unfortunately, existing sites are plagued by intentionally malicious or inappropriate users ("Trolls") and lack connection to any larger clinical infrastructure or oversight. Current text-based psychotherapy apps such as Talkspace and BetterHelp only offer individual therapy at a high cost (~$200/month), that is economically limiting for many patients [34,35].
Artificial Intelligence (AI) can offer continuous, real-time access to monitored peer-support in a platform that is scalable and economical. AI involves analyzing large amount of data with algorithms that automatically adjust, or "learn", as they are exposed to new information, drawing inferences unclear to a standard analysis. Natural language processing (NLP) is the subset of AI focusing on human language [36,37]. A body of previous work has established a computational link between language patterns and the emotional sentiment behind the language (e.g., self-harm, malicious intent) [36,[38][39][40]. Correctly classifying the intended sentiment of a text message allows the application to "flag" specific content. The use of NLP on messages within the peer support groups may provide significant benefits to users by proactively detecting when they might need immediate provider attention and ensuring peer-peer interactions are appropriate.
We propose a novel, machine-learning enabled, mobile treatment delivery method that fulfills the need for a robust, secure, technology-based peer support platform. The present "Marigold App" is an online peer-support platform that offers layers of innovation that are not readily available in other treatment programs or online forums: (1) continuous, economical, real-time access to supervised peer-support, (2) a monitored, high fidelity mobile environment with built in "flags" for specific risky behaviors, and (3) scalability. We plan to refine the existing application, currently used for mental health, to enable its use as an adjunct to treatment for OUD. This technology would allow outpatient treatment centers to overcome many patient-level barriers to retaining patients in OUD treatment (e.g., transportation, stigma) while also overcoming provider-level barriers (lack of workforce for real-time 24/7 monitoring of online conversations). We will develop a mobile application that allows patients to reach out for care at any time, from anywhere, with reduced concerns for stigma. The platform also can offer support during high-risk periods that occur outside of structured treatment settings. High risk periods may include the experience of cravings or urges to use substances. By enabling dynamic, continuous assessment of patient relapse risk, and by automatically notifying providers when a patient needs more intensive care resources, it will provide a safe and accessible format for peer support. Finally, the Marigold App has the potential for customization, including: (a) giving providers access to content discussed in peer support groups, improving the ability to address patient-specific concerns and needs; and, (b) passively tracking patient progress over time and adapting the algorithms to work on peer-group specific jargon. This platform is uniquely scalable due to the NLP technology. It will be the first of its kind to model language nuances specific to OUD and relapse. These machine-learning algorithms can provide automation and scale that could not be matched by humans monitoring 24/7 text conversations. This will allow existing peer recovery specialists to support larger patient capacities.

Specific Aims
There are two specific aims of this project. First, we aim to develop and optimize NLP algorithms to detect text messaging content that may signal relapse or impending relapse in patients in recovery from OUDs. The goal of the flagged content is to have the NLP-based algorithms approximate what a human (and specifically a clinician) might consider concerning text message sentiment. For instance, if a user messages "I feel like I can't do this anymore", the NLP-based algorithm should be both sensitive and specific to the sentiment in the text, not the outcome per say (e.g., at high suicide risk). Outcomes assessment (e.g., suicide risk) would still rely on a clinician assessment. Future work could evaluate and validate whether text message sentiment can accurately predict adverse mental health outcomes such as suicide risk or relapse. In future work, we will attempt to uncover correlations between language and specific sentiment in: (1) recurrent use of opioids or other substances (2) craving or urge to use opioids (3) pain (4) negative affect. The first domains are two explicit signals of relapse or impending relapse, whereas perception of poorly controlled pain and negative affect are predictive of relapse. Message data generated by the participants will be anonymized, tagged, and categorized by speech analysts to provide training data to identify flags specific to OUD. The NLP models will then be tested in our existing database of 100,000 text messages. On average, 1-2 messages per participant per day will need to be provided by each participant to yield effective AI procedures. Second, we aim to demonstrate feasibility and acceptability of Marigold Health's mobile peer support app as an adjunct to existing OUD treatment. In accordance with our prior work, primary outcomes will be feasibility and acceptability. We will also interview users of the app to explore qualitative satisfaction and to obtain feedback on user experience and feature refinement, using Marigold Health's and the Co-Is' standard qualitative assessment protocols.

Study Setting and Patient Population
Participants will be recruited to enroll in the Marigold App at Rhode Island Hospital's Lifespan Recovery Center, a hospital-affiliated outpatient clinic that offers medications for OUD (OUD; e.g., buprenorphine) and ancillary supports for patients with OUDs. In Rhode Island, the prevalence of OUDs and incidence of opioid overdoses are some of the highest in the country [41]. Similar studies have had high rates of recruitment and retention (>90% retention in studies of text-message-based behavioral health interventions) [42][43][44][45][46]. The average number of new patient admissions at the Lifespan Recovery Center is ~30 per month, and there is ample opportunity to recruit participants during treatment. This will offer the opportunity to observe interactions between individuals maintained in treatment and those who recently initiated treatment. Such dynamics open the window to capture more insights. Additionally, the center has established a research protocol within the clinic to evaluate patient-and program-level outcomes: to date, 92% of patients approached have agreed to participate in research-related activities. Participant characteristics include: 40% female; mean age 44 (range 20-76); ethnic and racial distribution as follows, 20% Hispanic, 84% White, 7% Black/African American, 7% more than one race, 1% Native Hawaiian/ Pacific Islander, and 1% American Indian/Alaskan Native.

Preliminary Work
Marigold Health's current mobile app (see Figure 1) allows text-based group therapy and peer support for individuals with depression and anxiety. The Marigold platform is ultimately intended for integration within the workflow and electronic health system of a care management team within a clinic or health system. The Marigold App can also integrate clinician-selected handouts, diagnostic forms, and polls, to facilitate the easy assessment of users in-app. In the application, users choose a support group after enrollment; this is typically based on characteristics of the group or after a one-on-one in-app chat with the group's moderator. Each support group consists of 5-9 users and a trained peer-support specialist. Patients' personal information is only visible to group moderators; peer users only see a user-chosen, pseudonym. App moderators facilitate the flow of text-conversations inapp (see Figure 2), and can text single or multiple group members.
The current version of the app is live at a dozen sites (~1000 users) demonstrating stability and reliability of the software. Marigold Health's NLP analytics are already highly accurate at detecting messages that require immediate provider intervention. NLP algorithms identify two "red flag" types of content within peer chats: (1) expressed or implied intent to harm self or others (2) malicious conduct or "trolling". To determine when content requires clinical oversight, NLP engineers correlations between language and specific sentiment by sorting sentences into binary (needs intervention aka "flag" or not) or multiple (level of severity) classification. Figure 3 depicts the performance of Marigold's Models on two classification tasks (binary and multi-class) when compared to state-of-the-art research methods. In the binary case the model is predicting whether a given message needs moderator intervention or not. Moderator intervention occurs in cases of self-harm, harm to others, risk of relapse, etc. For the multi-class problem, the model also predicts the severity of the message on a scale from 1 to 5. The metric used for evaluation is the Macro Averaged F1 score. The F1 score is computed as the weighted average of precision and recall and considers both false positives and false negatives into account. This F1 score is computed for each class individually and then simply averaged to compute the final Macro Averaged F1 Score. We also show the average performance of human annotators when attempting the same task.

Intervention Procedures
Recruitment will occur in two ways: (1) a prospective cohort study to obtain text message content and feasibility metrics and (2) qualitative interviews to evaluate usability and acceptability of the app itself. Participant recruitment will occur during the course of usual care at the Lifespan Recovery Center. The Marigold app will serve as an adjunct to usual care, which may include medication and additional services such as counseling, case management, and peer support. Participants will be eligible for enrollment if they are an English-speaking, adult (≥18 years-old) and meet DSM-5 diagnostic criteria for OUD per their treating provider at the Lifespan Recovery Center. The app is not currently available for non-English speakers, but other languages will be an area of future development. Participants will be excluded from the study if they do not have an Android or iOS platform smartphone, are pregnant, incarcerated, or unable to provide informed consent.
The study research assistant will recruit directly from the population of patients actively engaged in OUD treatment at the Lifespan Recovery Center. All eligible patients will be approached to participate in the program. After obtaining written and informed consent, participants will be administered a baseline assessment. The baseline assessment will be conducted on touchscreen tablets using REDCap (a HIPAA compliant online data collection system) [50] that allows for direct, secure, and remote data entry. Assessments will include measures of: socio-demographic variables; medical history (including history of chronic pain and mental health), history of substance use, and treatment history (e.g., type and duration of treatment). Upon completion of the baseline assessment, the Marigold App will be downloaded onto the participant's smartphone. Participants will be placed in peer support groups of 5-9 based on chronological order and on a rolling basis. Groups will be moderated by the study coordinator and research assistant. The research assistant will regulate the flow of conversation with standard text language developed by the study team, monitor for use, and respond to any generated "flags" according to the human subject's safety protocol. The small business has the infrastructure to monitor for flagged content 24/7 and the investigative team will utilize an on-call system to respond in real time. All text messages (non-flagged content) will also be reviewed by the study team within 24 h. Participants must maintain a minimum level of weekly in-app activity (an average of >1 message/day) to be considered "active". Participants will be compensated for their time $40 for the initial enrollment and then $5 daily for each day that the app is actively used, up to $250 in total. Moderators will attempt to engage inactive users, though users will have their accounts deactivated after two weeks of inactivity. In the event of 'trolling' the moderators will receive a flag and can respond in real-time to individual users or can escalate a behavioral health crisis to a local crisis team. Content flagged for suicidal or homicidal ideation is sent to a clinician for review in real time. Other content (e.g., trolling) is sent to the moderator in real time.
We will collect data on feasibility: study recruitment and refusal rates, program completion, follow-up rates, number and length of messages generated, and rates of study attrition. Primary outcomes will be feasibility (75% consenting; 80% retained at 4 weeks) and acceptability (mean of 75% logging in daily; system usability scores >80%, high qualitative satisfaction). At the end of the 6-week study period, participants will be asked to complete a web-based survey to measure participant acceptance as assessed by (1) System Usability Scale, a participant-completed, validated metric for measuring technologies' usability and acceptability [51,52], in which higher scores equate to higher usability [53,54]; (2) A modified version of the Client Satisfaction Questionnaire-8 [55], a validated measure of intervention satisfaction previously used by our team. If participants do not complete the web-based survey, they will be contacted by phone.
At the close of the six-week period, we will conduct qualitative interviews to refine the user interface of the application. Participants will be recruited from the Lifespan Recovery Center, as described above. Recruitment will occur purposefully, to ensure equal numbers of each sex and to represent patients in a variety of stages of change (e.g., new to treatment, in long-term treatment) and will continue until thematic saturation is reached. We will first test features of the app and document common user actions in-app, with each successive qualitative interview reviewing technical changes implemented since the previous one. We will follow the Think Aloud™ protocol for evaluating technical interfaces, in which individuals first verbalize their thoughts as they navigate the platform. Following this, participants will be interviewed and asked what they thought was missing or any difficulties they had. We will also show patients samples of engaging text messages, and note their responses (e.g., encouraging, overbearing) before adjusting messages for the next group. Participants will be compensation $40 for their participation in qualitative interviews.

DATA ANALYSIS Quantitative Assessment of Feasibility and Acceptability
Message data generated on the platform will be anonymized and tagged to (1) provide OUDspecific data to train existing models on trolling, suicidal or homicidal ideation, and (2) engineer new features as correlations between language and specific sentiment is uncovered, specifically: (1) recurrent use of opioids or other substances, (2) craving or urge to use opioids, (3) pain, and (4) negative affect. The first two domains are explicit signals of relapse or impending relapse, whereas perception of poorly controlled pain and negative affect are predictive of relapse [56,57]. We will split tagged data into a training set, to improve our algorithm on, and a testing set, to evaluate our M-F1 score on. As we train, we will also develop new features (e.g., "shaking or sweating are symptoms of withdrawal"). The larger data corpus obtained from this specific aim will allow us to test novel features that encode context and state of the user's drug habits. Examples of these features would be techniques to identify slang or euphemisms referring to drugs ("I want some brown sugar") or identifying whether users are at risky locations (if a user at risk of suicide is going to the roof of a building). Coding and analysis will be completed by the Marigold Health team.

Qualitative Assessment of Feasibility and Acceptability
Analysis will result in aggregated preferences and recommendations about message components. Interviews will be digitally recorded and transcribed. Using the technique of thematic analysis, categories and sub-categories related to the outcomes of interest will be grouped, or "coded" [58,59], deductive codes will be drawn from the interview guide topics (e.g., participant understanding of intervention design, message content and purpose), and inductive codes will capture additional themes that emerge from the participants. All transcripts will be independently double coded and then compared to ensure comprehensiveness. Agreed-upon codes will be entered into NVivo qualitative software [60]. Thematic summaries, describing the range of data in each code, will be discussed among the entire team, and used to adapt and refine the intervention. An audit trail of coding decisions and other aspects of analysis will be kept.

DISCUSSION
The Marigold App will allow patients to access a tailored support group 24/7, and is augmented with AI tools capable of understanding the emotional sentiment in messages, automatically "flagging" critical or clinically relevant content, creating a scalable system to keep groups safe and constructive. This project plans to demonstrate the robustness of these AI tools by adapting them to catch OUD-specific "flags" in peer messages while also examining the adoptability of the platform itself within OUD patients. This novel machine learning solution is poised to increase accessibility to treatment for OUD, the fastest-rising source of American morbidity and mortality. As healthcare systems and payers assume more risk for OUD patients, there is also a strong commercial potential for a mobile platform that can rapidly, ethically, and effectively deliver a moderated peer support community as an adjunct to standard treatment for OUD. One potential downside to moderation is that some users may be less inclined to use the app. The Marigold App offers continuous, economical, real-time access to supervised peer-support and has the potential to improve treatment outcomes, which in turn, could reduce future incidence of overdose and death.
A future proposal will conduct a fully-powered RCT to determine this technology's effect on patient outcomes (retention in OUD treatment and relapse); to quantify potential cost savings; and to further enhance our nuanced NLP tools capable of normalizing to specific individuals, demographic factors, and populations.