A natural language processing approach reveals first-person pronoun usage and non-fluency as markers of therapeutic alliance in psychotherapy

Summary It remains elusive what language markers derived from psychotherapy sessions are indicative of therapeutic alliance, limiting our capacity to assess and provide feedback on the trusting quality of the patient-clinician relationship. To address this critical knowledge gap, we leveraged feature extraction methods from natural language processing (NLP), a subfield of artificial intelligence, to quantify pronoun and non-fluency language markers that are relevant for communicative and emotional aspects of therapeutic relationships. From twenty-eight transcripts of non-manualized psychotherapy sessions recorded in outpatient clinics, we identified therapists’ first-person pronoun usage frequency and patients’ speech transition marking relaxed interaction style as potential metrics of alliance. Behavioral data from patients who played an economic game that measures social exchange (i.e. trust game) suggested that therapists’ first-person pronoun usage may influence alliance ratings through their diminished trusting behavior toward therapists. Together, this work supports that communicative language features in patient-therapist dialogues could be markers of alliance.


INTRODUCTION
Therapeutic alliance, the collaborative and trusting quality of patient-clinician relationship, is an active ingredient of successful psychotherapy treatment outcomes including patient engagement, retention, and eventual symptom improvement. 1 However, alliance has been notoriously hard to assess in real-world clinical practice due to conceptual and methodological challenges in finding objective markers that underlie patients' subjective experience of closeness with clinicians. Current gold standard assessments of alliance rely on either self-reports or human observers' qualitative coding of treatment interactions, which are subjective, labor-intensive, and time-consuming. 2 These drawbacks have limited the development of a scalable, real-time feedback system to treatment pairs that can improve clinical outcomes through the timely identification of negative session-level alliance. 3 One promising approach to leverage is the recent adoption of machine learning in health care research. For example, efforts have been made to assess alliance from a variety of measures directly recorded during therapy sessions such as patient-clinician language use, 4,5 head and body movements, 6 facial expressions, 7 respiration rate and heart rate variability, 8 and brain activities. 9,10 These studies have found initial evidence that the behavioral and physiological synchrony between patients and therapists during sessions can be measured as a proxy of alliance. 4,[6][7][8][9] To be clinically actionable, however, data-driven approaches must identify what specific behavioral features clinicians can pay attention to and potentially adjust to optimize alliance during therapy. Much of existing computational work unfortunately fails to provide such interpretability. For example, several studies using natural language processing (NLP), a subset of artificial intelligence that learns data structure from human language, have revealed algorithms that could predict patient-rated alliance, 11 therapist skills, 12,13 and therapeutic rupture events from session transcripts. 14 However, these algorithms are generally trained from the entire set of sentences uttered by a speaker and often provide high dimensional predictive features that are hard to interpret. Indeed, the lack of interpretability of model features in machine learning has been raised as a culprit for clinicians' reluctance to utilize artificial intelligence in health care. 15 In this study, we aimed to address this gap by combining both hypothesis-and data-driven approaches to assess personal pronoun usage and non-fluency in patients and therapists as interpretable linguistic markers of alliance. The rationale for pre-defining speech features this way is mainly 2-fold. First, the capacity of patients and therapists to communicate each other's thoughts and emotions adaptively has been identified as a universal factor that impacts alliance and clinical outcomes across heterogeneous psychotherapy practice settings. 16 Empirically, increased reference to the self in a dialogue-commonly represented by first-person singular personal pronouns, such as ''I'' and ''me'' -has been considered markers of failure to adaptively distance from negative emotional cues 17 and internalizing symptoms in the text messages of patients in online therapy. 18 The high frequency of self-focus through ''I'' usage has been generally linked with mental health burden, such as depression, 19 post-traumatic stress disorder (PTSD), 20 and compulsivity and intrusive thoughts. 21 By contrast, the use of ''I'' as an active voice 22 and an interactive agent with a therapist's discourse 23 were associated with positive therapy outcomes, suggesting the importance of engagement in a therapeutic dialogue. Second, relaxed styles during interactions have been observed as markers of highly affiliative relationships. 24 For example, the frequency of filler pauses (''um''), indicating relaxed production of natural speech, was associated with multiple indices of high alliance interactions, 25 such as a speaker's truthfulness, 26 emotional suppression, 27 and increased attention during a storytelling task in healthy volunteers. 28 Linguistic coordination in usage of the similar words and the rates at which they are said between two people have been shown to predict empathy, social support, and positive outcomes in individual therapy [29][30][31] and online mental health support communities. 32 Leveraging the feature extraction methods commonly used in NLP, here we quantified first-person pronouns and non-fluency as communicative function markers of both patients and therapists from singlesession transcripts and regressed these features on post-session alliance scores rated by the subjects. As an additional proxy of alliance with the therapy partner, we also administered the trust game, a behavioral economics paradigm that quantifies trust and reciprocity between two people as they play the roles of an ''investor'' and a ''trustee'' during monetary exchange. 33,34 Previous work has demonstrated that clinicians' communication ability was positively associated with patients' trust toward clinicians measured by how much patients would ''repay'' an investment in the trust game. 35 We calculated the subjects' repayment behavior toward therapy partner as an independent outcome variable of the session and tested if it would mediate the association between significant linguistic features and self-reported alliance. We hypothesized that the subject's higher use of first-person pronoun usage would correlate with lower therapeutic alliance. In contrast, we hypothesized that the subject's higher frequency of non-fluency markers, such as filler word usage would correlate with higher therapeutic alliance. Though findings in these speech features have been previously limited to patients or healthy controls in the laboratory, we hypothesized that the same direction of correlations will be observed in both patients' and therapists' language features, which together construct a treatment session.

Alliance rating
Working alliance inventory-short form rated by patients ranged from 41 to 84 with mean score of 70 (SD 12). Patient-rated alliance was positively correlated with therapist-rated alliance scores (r = 0.56, p < 0.002) and alliance with the previous therapist (n = 27; 1 missing, r = 0.46, p = 0.02). Therapist-rated alliance was positively correlated with patient's avoidant attachment scores (r = 0.40, p = 0.04). There were no differences in patient's alliance across individual age, sex, diagnosis, attachment scores, duration of treatment, therapist's experience, medium or modality of treatment (p > 0.05) (see Figure S1).

Relationship between trust game behavior and language features
Finally, patients' average repayment fractions toward therapists in the trust game were negatively correlated with their therapists' frequency of speaking ''we'' and ''i'' in the sessions (r = À0.38, p = 0.05; r = À0.53, p = 0.004) ( Figures 3A and 3B). Patients' average repayment fractions toward therapists were positively correlated with self-reported alliance ratings (r = 0.48, p = 0.003), whereas therapists' average repayment fractions toward patients were not ( Figure 3C). Patients' repayment fractions were not significantly correlated with their AUX-INTJ transition probabilities (r = 0.37, p = 0.06). We also explored whether negative correlations of therapists' first-person pronoun usage with alliance were mediated by repayment behavior. The mediation analysis indicated a significant effect of the indirect path for ''therapist_i'' (a * b = À1.84, p = 0.04, 95% CI = À4.37 to À0.03), but not for ''therapist_we'' (a * b = À3.05, p = 0.05, 95% iScience Article CI = À7.50 to 0.01), indicating that the patients' perceived trustworthiness of the therapist-as measured by their repayment toward the therapist in the trust game-mediated the effect of therapists' first-person singular pronoun use on patient-reported alliance ( Figure 3D).

DISCUSSION
Scalable, yet interpretable markers of patient-therapist alliance in naturalistic psychotherapy sessions can provide timely and clinically actionable feedback in mental health treatment. Here, we analyzed personal pronoun usage and non-fluency markers using feature extraction methods commonly used in NLP, combined with self-reported surveys of alliance and a game theoretic approach toward alliance (i.e. trust game) (Figure 4). Our study provides the first computational evidence that both first-person pronoun and non-fluency are potential language markers that are predictive of therapeutic alliance and interpersonal trust during psychotherapy treatment.
Our primary finding was that more frequent first-person pronoun usage in both therapists and patients (''we,'' ''i do,'' ''i think'', ''when i'') characterized sessions with lower alliance ratings by patients crosssectionally, consistent with the first hypothesis. Sentences containing these features were largely statements disclosing their thoughts and emotions. In a treatment context where patients predominantly expressed themselves (''i'') ( Figure 1C), therapists' expression of ''i,'' especially with cognitively geared verbs (e.g. ''i do'', ''i think''), may have signaled their inadequate responsiveness to patients' emotional  Each sentence is annotated with the P = alliance score rated by patient, T = alliance score rated by therapist. iScience Article needs, which has been associated with negative treatment outcomes. 36 In terms of ''we,'' one might assume that the usage of such pronoun that signals inclusiveness might correlate with higher, rather than lower alliance. However, when used by therapists, a higher frequency of ''we'' could have indicated their therapeutic techniques to bring the strained relationships the patients were dealing with into the ''we'' mode of togetherness. 37,38 These speech features not only correlated with patients' perception of alliance, i.e. self-report, but also with objectively measured behavioral proxies of trust ( Figures 3A-3C). Expanding the previous findings from the trust game literature in which patients repaid higher amounts to clinicians with good communication skills, 35 we also identified a role of interpersonal trust behavior in mediating the negative relationship between the therapist's use of ''i'' and patient-rated alliance ( Figure 3D). This result further revealed that the therapist's self-expression might negatively influence alliance through direct cognitive changes related to interpersonal processing in the patient.
Regarding the second hypothesis, we found that higher non-fluency in patients (e.g. ''is like,'' ''umm''), but not in therapists, characterized sessions with higher alliance ratings by patients ( Figures 2B and 2D). In natural language, honest and emotionally regulated speech often contains non-fluency. 26,27 This offers a plausible explanation for our finding that in our sample, patients who reported stronger alliance were more honest and more willing to effectively communicate their emotions to their therapists. The probability of the auxiliary verb token transitioning to the filler word token also identified the end of patients' sentences being acknowledged by therapists (e.g. ''could. Oh'', ''are. Yeah''). This finding was consistent with previous work demonstrating that interpersonal attunement measured from behavioral coordination [6][7][8][9]25 predicted alliance during treatment.

Limitations of the study
Our results should be interpreted with the following caveats. First, this is an observational study, which does not provide causal or mechanistic insight into the relationship between linguistic patterns and therapeutic alliance. Future studies that incorporate interventional and/or longitudinal designs (e.g. clinical trial) might be able to address causality by examining the effects of language stimuli with or without these language features on therapeutic outcomes. Second, we did not analyze the immediate context in which the features were discovered. It is possible that the use of personal pronouns and non-fluency could have different meanings depending on the clinical context they were said. Analysis of non-verbal features (e.g. voice acoustics) and/or domain expert annotation could identify such context. Third, the fidelity of paralinguistic markers (e.g. ''umm'') may be limited due to the imperfect nature of human transcription, despite being the current gold standard. State-of-the-art automatic speech recognition technology that can transcribe paralinguistic markers/disfluencies with good accuracy against human transcription can significantly address this methodological limitation, especially when text analysis is done at scale. Finally, the sample size is small, thus limiting statistical power to detect significant relationships between alliance and other language features outside of our main hypotheses. Nevertheless, this study provides important initial insight that could lay the foundation for larger-scale studies to replicate existing findings and identify additional interpretable and predictive language features of alliance.
In summary, our NLP approach revealed first-person pronoun and non-fluency features as clinically relevant markers of alliance from psychotherapeutic dialogues. As psychotherapy begins to integrate more technology (e.g., teletherapy and text-based therapy), computational analysis of patient-clinician interactions can be a fruitful avenue for elucidating key elements that make treatment effective at scale.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following: . Continued (D) Mean repayment fraction across 10 rounds statistically mediated the association between ''therapist_i'' and patientreported alliance, suggesting a potential mechanism in which therapist's language recruits a trusting behavior to impact alliance. Arrows indicate the direction of linear regressions, annotated with coefficient estimates (standard error) and 95% confidence intervals *p < 0.05, **p < 0.01.

Materials availability
This study did not generate new unique reagents.
Data and code availability d De-identified datasets have been deposited at a publicly available repository as of the date of publication. DOIs are listed in the key resources table. The full transcript data in this study cannot be deposited in a public repository because these are withheld by the corresponding author's institution IRB to preserve patient and therapist privacy and confidentiality.
d All original code has been deposited at a publicly available repository as of the date of publication. DOIs are listed in the key resources table.
d Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

EXPERIMENTAL MODEL AND STUDY PARTICIPANT DETAILS
In this cross-sectional study, we recorded twenty-eight single sessions from ongoing psychotherapy treatment (patient n=28, therapist n=18) in two academic hospital-based outpatient clinics, one general adult psychiatric clinic and one personality disorder-specialized clinic in New York City from February 2020 to November 2021. The median age in our patient sample was 39 years old, with a range of 19 to 64 years (SD 15). 20 (71%) of the subjects were female, and 6 (21%) were receiving treatment in the personality disorder clinic. The most commonly represented psychiatric diagnosis was personality disorder 15 (54%, 12 borderline personality, 3 narcissistic personality), followed by mood disorder 7 (25%, 5 unipolar and 1 bipolar depressive disorders, 1 unspecified mood disorder), and anxiety and trauma-related disorders 6 (21%, 2 generalized anxiety disorder, 4 post-traumatic stress disorder). All patients had previous therapy experience with a median number of lifetime therapists of 5, ranging from 2 to 15 (SD 3.8). 12 (71%) of the therapists were female. 6 (25%) were faculty psychologists, and 11 (75%) were trainee psychologists and psychiatrists. All therapists provided non-manualized psychotherapy with supportive and relationally oriented techniques, 24 (86%), or cognitive-behaviorally oriented techniques, 4 (14%), to improve their patients' interpersonal functioning. Eight therapists provided more than one session with different patients in the sample. At the time of recording, patients were at a median of 14.5 th session, with a range of 2 to 160 sessions (SD 32). The exclusionary criteria included use of a non-English language during therapy sessions and presence of neurological or other conditions that affect perception and expression of language. Written informed consent and our protocol were approved by the Institutional Review Board at Icahn School of Medicine at Mount Sinai.

QUANTIFICATION AND STATISTICAL ANALYSIS
The personal pronoun and bi-gram transition probability features were linearly regressed against alliance scores using F-test for statistical significance (false discovery rate corrected, a = 0$05). The LIWC feature frequency differences between speaker roles and pairwise correlations with alliance (self-reported ratings and trust game average repayment fraction) were summarized using a paired t-test and Pearson's r after logarithmically transforming the values. For transition probabilities, which were highly skewed towards ll OPEN ACCESS iScience 26, 106860, June 16,2023 iScience Article zero, we used Wilcox rank sum test and Spearman's r, respectively. 49 Alliance scores were compared using Kruskal-Wallis one-way ANOVA across categorical clinical variables. Partial correlation corrected by the duration of treatment (only for AUX-INTJ, due to its correlation with treatment duration, which was logarithmically transformed; see Figure S2), and Steiger's Z test to examine differences in dependent correlations (feature $ subscores of alliance, i.e. goal, task, bond), were performed using 'psych' R package. 50 For exploratory analysis testing if the average repayment fraction from trust game mediates the association between therapist's first person pronoun features and alliance ratings, we used 'mediation' R package. 51 We estimated confidence intervals in the effects of mediation using a quasi-Bayesian approximation approach (1,000 iterations, a = 0.05) and considered the mediation significant if the total indirect effect (a * b) was statistically significant, while the previously significant direct effect (path c) became non-significant after controlling for the mediator. All statistical analyses were conducted with two-sided Type I error of 5%. Python 3.8.