Capturing and missing the patient's story through outcome measures: A thematic comparison of patient‐generated items in PSYCHLOPS with CORE‐OM and PHQ‐9

Abstract Background There is increasing interest in individualized patient‐reported outcome measures (I‐PROMS), where patients themselves indicate the specific problems they want to address in therapy and these problems are used as items within the outcome measurement tool. Objective This paper examined the extent to which 279 items reported in an I‐PROM (PSYCHLOPS) added qualitative information which was not captured by two well‐established outcome measures (CORE‐OM and PHQ‐9). Design Comparison of items was only conducted for patients scoring above the “caseness” threshold on the standardized measures. Setting and patients 107 patients were participating in therapy within addiction and general psychiatric clinical settings. Main results Almost every patient (95%) reported at least one item whose content was not covered by PHQ‐9, and 71% reported at least one item not covered by CORE‐OM. Discussion Results demonstrate the relevance of individualized outcome assessment for capturing data describing the issues of greatest concern to patients, as nomothetic measures do not always seem to capture the whole story.


| INTRODUCTION
Recent studies have shown a renewed interest in the individualized assessment of change during talking therapies. [1][2][3] The strategy relies on the use of individualized patient-reported outcome measures (I-PROMS), where patients themselves indicate the specific problems they want to address in therapy and these problems are used as items within the measurement tool. It is assumed that such an individualized approach is more able to capture the uniqueness of each patient's condition. However, there is scant evidence supporting this assumption. In this study, we contrast an I-PROM with two well-established standardized outcome tools, in order to identify the extent to which patients add items that are not covered by standardized tools.
Psychological outcome assessment typically uses repeated administration of standardized self-report instruments (PROMS), in order to detect the patient change over time according to nomothetic principles of classical psychometrics. The same instrument, measuring a construct with fixed pre-selected items that capture variance on universal dimensions of the construct, is administered to all peo- For example, to assess treatment outcomes for depression, disorderspecific PROMS may be used, which will locate each patient score relative to population norms, thus allowing the formulation of a formal diagnose with acceptable levels of between-diagnostician agreement.
Inferences based on these scores can be made about clinical recovery by comparison with dysfunctional population scoring levels, as well as informing epidemiological and evidence-practice research (eg identifying effective treatments for depression). However, nomothetic PROMS may not identify and measure change on key variables that afflict individual depressed patients, including his or her specific context, difficulties and treatment priorities. To measure treatment outcomes without missing the uniqueness of the patient's condition requires an idiographic assessment approach, that is using psychological assessment instruments tailored for each individual. 2,9 Individualized PROMS 2, also called patient-generated outcome measures, evaluate the degree to which a patient changes on items selected by the patient. Items correspond to personally defined problems or pertinent situational variables that can serve as indicators of change on aspects of importance to each patient. 2,10 One advantage of I-PROMS is that increased attention is given to patients' preferences and wishes in relation to their health care, which is more aligned with the values of patient-centred care. 11 Also, therapists claim that the routine use of I-PROMS is beneficial in preparation for clinical sessions, to elaborate discussions after completion of the session, for supervision meetings, and for making clinical decisions concerning treatment. 12,13 Furthermore, there is evidence that to what extent will they report items that are not covered at all by well-established PROMS? To our knowledge, the only previous study to address this question compared one of the most commonly used I-PROMS, PSYCHLOPS (Psychological Outcome Profile' see http:// www.psychlops.org.uk), with CORE-OM in a community-based talking therapy service in primary care. 5 The results showed that 60% of patients provided novel and relevant clinical information in their freetext responses that would otherwise not have been considered in their outcome assessment. 5 Our study aims to expand on these findings by extending the comparison of PSYCHLOPS to include both CORE-OM, a general measure of psychological distress and PHQ-9, another nomothetic measure in widespread use but with narrow depressionspecific focus, and to test these instruments in an entirely different population.

| Participants
Two distinct samples (psychiatric patients and drug and alcohol patients, total n = 107) were enrolled, in order to provide a broad clinical range and ensure that findings would be generalizable to a secondary care population. The inclusion criteria were to be over 18 years  Table S1).

| Instruments
PSYCHLOPS 15 is a brief I-PROM containing three free-text items indicated by the patient: "Choose the problems that troubles you most," "Choose another problem that troubles you" and "Choose one thing that is hard to do because of your problem (or problems)." Each freetext item is scored on a 6-point Likert scale for severity (from "0 = not at all affected", to "5 = severely affected") and duration (from "0 = under 1 month", to "5 = over 5 years"). PSYCHLOPS also contains a fourth preset question ("How have you felt in yourself this last week," scored from "0 = very good" to "5 = very bad"). These questions cover three domains: problems, functioning and well-being although we excluded well-being as this question was standardized and contained no qualitative data. The final comparison between instruments was conducted using PSYCHLOPS data obtained from responses to the problem and functioning domains. 16  Patient Health Questionnaire-9 items 17 is a 9-item multipurpose tool for screening, diagnosing and monitoring the severity of depression according to DSM-IV-R criteria. Items are scored from 0 ("not at all") to 3 ("nearly every day"). Questions refer to patient experience over the preceding fortnight. All measures were administered in Portuguese.

| Procedure
Patients were invited to arrive at the hospital one hour prior to their first appointment, for a pre-treatment evaluation session.
PSYCHLOPS was the first instrument to be administered, followed by CORE-OM and PHQ-9 in random order; a socio-demographic data collection form was presented at the end. Patients with literacy or visual problems were not excluded but offered support by a research assistant who administered the tools orally. The analysis procedure followed three major steps:

| Free-text coding
The free-text responses were coded using a 61 subtheme classification system, 18  and CB) coding each item independently, and when agreement could not be reached, a process of triangulation was adopted in discussion with the study supervisor (CS). Inter-rater reliability, given by the average of Cohen's kappa across all rater pairs, was strong (problem 1-0, 81; problem 2-0,83; functionality item-0,89). Triangulation was required in less than 1% of the items (n = 14).

| Content matching
The 65 subthemes derived from PSYCHLOPS responses were compared with the content of CORE-OM and PHQ-9 (see Tables S2 and   S3, respectively). Two independent judges (IN and RC) determined whether each subtheme did or did not map directly to items included in CORE-OM and PHQ-9, classifying the matching into one of four categories: (1) definite yes: there is a direct and clear matching of contents (eg subtheme "Sleeping problems" and CORE-OM/PHQ-9 item that reports problems in sleeping); (2) possible yes: subtheme reports a problem that is probably related to a problem reported on CORE-OM or PHQ-9 (eg problems of "concentration at work" could probably be connected to CORE-OM/PHQ-9 anxiety items); (3) possible no: vague subthemes, or general, that might or might not be associated with CORE-OM or PHQ-9 items (eg "Relationships" is a vague statement and difficult to determine whether it is matched to any CORE-OM or PHQ-9 item); (4) definite no: clearly there is no matching, subtheme with a different content. When agreement could not be reached, a third judge was consulted and the original free-text responses on PSYCHLOPS were compared with the CORE-OM (or PHQ-9) items to provide more evidence on matching. Inter-rater reliability (two-way mixed intraclass correlations, average-measures, absolute agreement) was strong for content matching with CORE-OM (ranging from 0.92 in item 23 to 1.00) and PHQ-9 (ranging from 0.99 in item 9 to 1.00).
Judges were aware of the aim of the categorization (they knew that hypothetically I-PROMS capture additional information in the outcome measurement process), which could result in coding bias. To minimize this effect, a separate coding database was prepared for each coder containing the free-text items only, that is anonymized and without information concerning patients' demographic or clinical data, as well as scorings on PSYCHLOPS or in the nomothetic counterparts.
Moreover, content non-matching was only categorized for "definite no" items; items classified as "possible no" were not categorized as non-matching.

| Descriptive statistics
We calculated the frequency of each subtheme found in PSYCHLOPS and the frequency of patients who indicated each subtheme in PSYCHLOPS. We also calculated the numbers and proportion of patients above the clinical threshold who indicated at least one subtheme which did not map into CORE-OM and PHQ-9 (ie frequency of patients with at least one "definite no" item). This comparison was confined to patients who were classified as "cases" by each nomothetic instrument. As such, we compared PSYCHLOPS with PHQ-9 only for items formulated by patients classified as depressed and PSYCHLOPS with CORE-OM only for patients classified as having clinical psychological distress. For CORE-OM, "caseness" was defined as patients with a score of ≥10 and for PHQ-9 a score of ≥10. 19,20 IBM SPSS Statistics 21 ® was used.  Table S4, patients entering substance misuse treatment reported more addiction, work-related and money problems, whereas patients in psychiatric setting indicated more often being worried about someone in their family and worries about health.

| RESULTS
The comparison between the 51 PSYCHLOPS subthemes and the two nomothetic measures showed that a large proportion of subthemes were not present in CORE-OM (33.3% classified as "definite no") nor in PHQ-9 (84.3%, "definite no") (see Table S4). A large proportion of