Interpretation threshold values for patient-reported outcomes in patients participating in a digitally delivered first-line treatment program for hip or knee osteoarthritis

Objective Establish proportions of patients reporting important improvement, acceptable symptoms and treatment failure and define interpretation threshold values for pain, patient-reported function and quality-of-life after participating in digital first-line treatment including education and exercise for hip and knee osteoarthritis (OA). Methods Observational study. Responses to the pain Numeric Rating Scale (NRS, 0–10 best to worst), Knee injury and Osteoarthritis Outcome Score 12 (KOOS-12) and Hip disability and Osteoarthritis Outcome Score 12 (HOOS-12, both 0–100 worst to best) were obtained for 4383 (2987) and 2041 (1264) participants with knee (hip) OA at 3 and 12 months post intervention. Threshold values for Minimal Important Change (MIC), Patient Acceptable Symptom State (PASS) and Treatment Failure (TF) were estimated using anchor-based predictive modeling. Results 70–85% reported an important improvement in pain, function and quality of life after 3 and 12 months follow-up. 42% (3 months) and 51% (12 months) considered their current state as satisfactory, whereas 2–4% considered treatment failed. MIC values were −1 (NRS) and 0–4 (KOOS/HOOS-12) across follow-ups and joint affected. PASS threshold value for NRS was 3, and 53–73 for the KOOS/HOOS-12 subscales Corresponding values for TF were 5 (NRS) and 34–55 (KOOS/HOOS-12). Patients with more severe pain at baseline had higher MIC scores and accepted poorer outcomes at follow-ups. Conclusion Threshold estimates aid in the interpretation of outcomes after first-line OA interventions assessed with NRS Pain and KOOS/HOOS-12. Baseline pain severity is important to consider when interpreting threshold values after first-line interventions in these patients.


Background
Patient-reported outcome measures (PROMs) are recommended as the primary end-point in clinical trials evaluating treatment effects in medical conditions [1]. For a meaningful interpretation of the PROMs used, different approaches have been proposed [2], such as the Minimal Important Change (MIC), which is the smallest change in scores that represents an important improvement for the average patient, i.e., the patient is feeling better [3], the Patient Acceptable Symptom State (PASS), which represents the cut-off for the health status that the average patient considers acceptable, i.e., the patient is feeling good [4,5], and Treatment Failure (TF), which represents the cut-off below which the average patient considers their state so unsatisfactory that they think the treatment has failed [2], all based on relevant anchor questions.
Previous research in patients with anterior cruciate ligament (ACL) injury [2] and femoroacetabular impingement [6] revealed that feeling better is not necessarily the same as feeling good and reporting only the MIC (or mean change) may overestimate the results. For the patient, reaching an acceptable state of symptoms seems to be more important than feeling better [5], stressing the need for including more than a single measure of improvement to evaluate treatment effects on PROMs.
In patients with hip and/or knee osteoarthritis (OA), the Numeric Rating Scale (NRS), evaluating joint pain, and the Knee Injury and Osteoarthritis Outcome Score (KOOS) or the Hip disability and Osteoarthritis Outcome Score (HOOS), evaluating joint pain, symptoms, function and health-related quality of life are commonly used PROMs for assessing improvement during rehabilitation/treatment [7]. Although thresholds for MIC, PASS and TF have been established for the KOOS in patients after ACL injury [2,8,9] and meniscal surgery [10], such thresholds are not clearly defined in patients with OA. MIC values ranging between À1.5 and 21 were reported for the different KOOS subscales at 1-year follow-up after non-surgical treatment of knee OA [11], while an MIC between 12 and 15 points for the KOOS subscales was found after 4 weeks of physical therapy treatment for knee OA [12]. However, the anchor questions used in these studies seem not to have been domain based, which may have violated the validity of the results [13]. A few studies have examined the MIC of a subset of subscales (pain, quality of life) or short-forms of the KOOS/HOOS (PS) scores after surgical procedures, such as total knee replacement or hip arthroplasty [14][15][16]. The specific cut-offs that are relevant for patients are suggested to be population and context-based [17] and the MIC, PASS and TF for the KOOS/HOOS and NRS in knee and hip OA patients undergoing first-line treatment, including education and exercise, for hip or knee OA remain to be determined.
The aims of the present study were to 1) establish the proportion of patients reporting important improvement, acceptable symptom-levels and/or treatment failure at 3-and 12-month follow-ups after participating in digital first-line treatment for hip or knee OA, and 2) define MIC, PASS and TF cut-offs in HOOS/KOOS and the NRS-scales at both follow-ups, using anchor-based methodology and predictive modeling.

Methods
This was a retrospective analysis of prospectively collected data, adhering to the STROBE-guidelines for observational studies [18]. The study was approved by the Swedish Ethical Review Board (2021-06-16, Dnr 2021-01713) and pre-registered at ClinicalTrials.gov (NCT05316194). All participants gave digital consent prior to data extraction and analysis for research.

Intervention
All participants participated in a digitally delivered first-line treatment program for hip or knee OA, described in detail [19]. The program is app-based, inspired by the Swedish face-to-face management program for OA "Better management of patients with OsteoArthritis" [20] and includes weekly educational sessions, individualized exercises and a possibility to chat asynchronously with a physical therapist during the entire duration of the program. Data for participant demographics (e.g., sex, age, painful joint (hip/knee), body mass index (BMI), educational level) were collected at registration. All participants were asked to answer a set of PROMs at the beginning of treatment (baseline) and at follow-ups of 3 and 12 months as well as anchor questions at both follow-ups.

Participants
Data for all participants enrolled in the program from inception to July 2021 were extracted from the digital treatment register. Inclusion criteria were i) diagnosed hip or knee OA, ii) registered to the digital program and having at least one session with the physical therapist and two weeks of program participation, and iii) provided answer for KOOS/ HOOS questionnaires and/or NRS pain at baseline and any of the followups at 3 (10-14 weeks) and/or 12 months (50-54 weeks), and to the anchor questions at any follow-up.

Questionnaires and anchor-based questions
The short versions of the KOOS (KOOS-12) (knee OA) and HOOS (HOOS-12) (hip OA) [21,22] were used for evaluating knee/hip pain, function and quality of life. KOOS/HOOS-12 [23] include 12 items of the original KOOS/HOOS questionnaires [23] measuring knee/hip pain, physical function and knee/hip-related quality of life. All items were scored from 0 to 4. The scores were then normalized to a score from 0 to 100 for each domain (pain, function and quality of life) as well as a total score of all three domains where 0 indicated extreme problems and 100 indicated no problems.
Joint pain was measured with the NRS-scale. NRS comprises an 11point scale where 0 indicated no pain and 10 indicated the worst possible pain during the last week [24].
The MIC, PASS and TF associated with KOOS/HOOS-12 (for each domain and for total score) and NRS scores, respectively, were calculated using domain specific anchor questions adapted from previous studies [8,9], all outlined in Table 1. For the total KOOS/HOOS-12 score, the anchor question with the highest correlation to the total score was used. For the MIC score, participants had 7 response options ranging from "an important improvement" to "an important deterioration", whereas the PASS and TF questions had a dichotomous reply (yes/no) ( Table 1).

Statistical analysis
For MIC calculations, anchor-responses were graded as "importantly improved" if they answered "Better, an important improvement" or "Somewhat better but enough to be an important improvement". MIC for an important improvement was then calculated with the predictive modeling method using logistic regression analysis with improved/not improved as dependent variable and the specific PROM change score as independent variable. This method provides more precise estimates than the more traditional method based on receiver operating characteristic analysis [25]. Additionally, predictive modeling allows to adjust for the bias that results from having uneven distributed proportion reporting being importantly improved [26]. The MIC value corresponds to the change score that is equally likely in respondents reporting improvement and in those reporting no improvement and reflects the average unmeasurable individual thresholds for important improvement [25,26].
The predictive modeling method was also used to calculate the post score (i.e., 3 and 12 months) for KOOS/HOOS-scores and NRS-scores for participants that had responded "Yes" to the PASS or TF anchor question, respectively. The proportion of participants reaching MIC, PASS and TF for the KOOS/HOOS and NRS based on the response to the anchor questions, were calculated separately for participants with knee and hip OA.
Subgroup analyses on the effect of baseline severity on MIC, PASS and TF were also performed using the predictive modeling method [25,26]. To avoid potential bias associated with baseline dependency calculations, all participants were divided into two groups (high/low severity) using baseline mean NRS pain scores for KOOS/HOOS analyses and mean KOOS/HOOS pain scores for the NRS analyses as cut-off, in line with previous description [27]. Sensitivity analyses for MIC scores including only participants with responses at both 3 and 12 months follow-up were calculated using the predictive modeling method.

Results
Data for 11,708 participants were extracted from the registry whereof 6952 and 4756 had knee and hip OA, respectively. Of these, 2058 (29.6%) of participants with knee OA and 1507 (31.7%) of those with hip OA had no follow-up response and were excluded. Of the remaining, we included 4383 (63% of total sample) participants with knee OA at 3 months follow-up and 2041 (29% of total sample) at 12-month follow-up with complete data on the PROMs and anchor questions. Corresponding numbers for hip OA were 2987 (63% of total sample) and 1264 (27% of total sample) individuals (See Table 2 for characteristics).

MIC
The correlations between the KOOS/HOOS and NRS scores and the anchor questions ranged between À0.34 and À0.52 (Appendix A, Table 1). The anchor with the highest correlation to KOOS/HOOS total was the question regarding pain and this anchor was used in the analyses for total scores.
Proportions of participants importantly improved at 3 and 12 months ranged from 69.5 to 84.3% for the different KOOS/HOOS and NRS scores ().
For participants with knee OA, the MIC for improvement at 3 months was À1 for NRS and ranged between 2 and 4 for the different KOOS subscales and total score. The corresponding values at 12 months were À1 (NRS) and 0-4 (KOOS subscales and total score) ().
For participants with hip OA, the MIC for improvement at 3 months was À1 for NRS and ranged between 2 and 3 for the different HOOS subscales and total score. The corresponding values at 12 months were À1 (NRS) and 0-4 (HOOS subscales and total score) ( Table 4 and Appendix B, Figs. 2-4). Table 1 Anchor-questions used to evaluate Minimal Important Change (MIC), Patient Acceptable Symptom State (PASS) and Treatment Failure (TF).
NRS pain PASS thresholds at 3 and 12 months of follow-up were 3.0 and 2.7, respectively, for both knee and hip OA. KOOS/HOOS PASS (subscales and total score) thresholds at 3 and 12 months follow-up ranged between 53 and 71 (knee OA) and 56 and 73 (hip OA) ( Table 5).

TF
150 (3.4%) participants with knee OA and 118 (4.0%) participants with hip OA reported their current function so unsatisfactory that they considered that the treatment had failed at 3 months. The corresponding number of participants at 12 months were 49 (2.4%) and 21 (2.1%) ( Table 3 and Appendix B, Fig. 5).

Subgroup analysis
The subgroup analysis on the effect of baseline pain severity showed that the MIC values were highly dependent on baseline pain severity. Thus, in patients with worse symptoms at baseline, MIC thresholds reflected larger needed absolute improvement in comparison to patients with lesser symptom levels, while PASS and TF thresholds in those with worse symptoms at baseline reflected poorer outcomes, i.e. those with greater baseline symptom severity accepted a poorer health state as satisfactory. This was true for both pain and KOOS/HOOS-12 scores (subscales and total) irrespective of joint affected (Appendix A, Tables 2 and 3).

Sensitivity analysis
The analysis of NRS pain MIC values at 3 and 12 months, including only those responding at both follow-ups, indicated no important change in the results (Appendix A, Table 4).

Discussion
In the present study, at least 70% of the participants reported an important improvement in pain, function and quality of life after 3-12 months participation in a digital first-line treatment for hip and knee OA. Forty-two and 51% considered their current state as satisfactory at 3 and 12 months, respectively, whereas only a few (2-4%) considered that the treatment had failed, irrespective of joint affected. One unit improvement in pain on the NRS scale was considered an MIC at both follow-ups for both patients with knee OA and hip OA. The corresponding values for the different KOOS/HOOS-12 subscales ranged from 0 to 4. Participants with worse symptoms at baseline required larger improvements to be considered an MIC for all PROMs at follow-up and accepted poorer outcomes in relation to PASS and TF compared to those with lesser baseline symptoms, indicating that all interpretation threshold values were highly dependent on baseline severity.
We found a one unit reduction on the NRS to represent MIC at both 3 and 12 months follow-up after participation in a digital first-line treatment for hip and knee OA. This result establishes this cut-off as an MIC in pain in people with hip and knee OA and could thus be used as a clinically relevant cut-off in future research trials in this population.
The MIC scores reported for the different KOOS/HOOS-12 subscales in this study ranged between 0 and 4 for both follow-ups. This is a considerably lower MIC than most of the scores previously reported in patients with knee OA after non-surgical treatment, where MICs between 3 and 18 [11] and 12 to 15 [12] were observed for the different subscales of the full KOOS questionnaire. There may be several reasons for this discrepancy. In contrast to the studies by Mills et al. [11] and Mostafaee et al. [12], we used the short versions (KOOS/HOOS-12) [21,22] of the original KOOS/HOOS questionnaires as well as specific domain based anchor questions for each subscale instead of the same anchor across subscales. In addition, we used the predictive modeling method instead of ROC-analysis since recent research proposed the predictive modeling approach to be a more precise measure when evaluating these thresholds [25]. Unlike ROC-analysis, the predictive modeling method takes into account unequal variances between groups and prevalence of improved patients [25,26], which may be a reason for the lower values in our study.
Mahler et al. reported a PASS cut-off of 53 for KOOS-physical function (PS) after 3 months participation in a stepped care approach including education, physical therapy and analgesics for knee OA [28]. The KOOS-PS was derived from the two KOOS subscales Activity of Daily Living and Sport recreation and holds a different set of items than the KOOS/HOOS-12 function subscales used in the current study. Mahler et al. used a generic anchor (i.e., not specifically related to function) and the PASS score was calculated using the 75th percentile of the cumulative KOOS-PS score, which may further explain the differences between that study and the current study. The PASS cut-offs for KOOS/HOOS-12 scores (subscales; pain, function and quality-of-life) ranged between 53 and 73 in the current study, further highlighting that no single PASS cut-off should be used across KOOS/HOOS dimensions.
Although approximately 70%-80% reported to have experienced an important improvement in all PROMs after program participation, similar to the study by Mahler et al. [28], only around 40%-50% reported an acceptable symptom state at the different follow-ups. Experiencing an important improvement is, as reported in e.g. patients with ACL injury [2], not the same as being satisfied with the current state of function or symptoms. Using the MIC as the only responder criterion may overestimate the results also in patients with hip and knee OA. The PASS score for pain was approximately 3 on the NRS scale, irrespective of time point (3 vs. 12 months) and joint affected, while the corresponding values for KOOS/HOOS-12 scores ranged between 53 and 73. The relatively small changes in PROM scores (À1 (NRS), 0-4 KOOS/HOOS)) that were considered an MIC by the patients implies that even though an important improvement was perceived, many patients did not reach the   state they would consider acceptable. It is, therefore, important to include not only a single interpretation threshold measure such as MIC in rehabilitation and research to gain a fair and full picture of the results of a specific intervention. Not all individuals will benefit from education and exercise for OA [29]. In the present study, around 2-4% considered their symptoms so severe after program participation that they thought the treatment had failed. The TF threshold for the NRS score for these patients was 5 and below 54 to 33 on the different KOOS/HOOS-12 subscales at both follow-ups. Previous studies suggest that both adherence to treatment [29] and co-morbidities, such as obesity, depression and cardiovascular disease, may be factors relevant for not responding to treatment [30]. A recent qualitative study revealed that those who did not respond to first-line treatment believed that their low adherence to the treatment was the reason for the unsuccessful outcome and that over-weight, comorbidities and psychological factors prevented them to fully participate in the exercises [31]. It may be important to identify potential non-responders to exercise and provide behavioral and motivational support in an effort to increase adherence to first-line treatment and thereby increase the chances for a satisfying outcome in these individuals [32]. Interpretation thresholds may add in identifying non-responders/responders to specific interventions, as well as being useful as reference in patient communication regarding treatment expectations and outcomes.
In line with studies reporting those with worse baseline symptoms to have higher threshold MIC scores for NRS and KOOS in patients with musculoskeletal disorders [33] and knee OA [11], we here found MICs in those with greater baseline pain to reflect larger improvements for all PROMs. On the other hand, those with more severe symptoms at baseline seemed to accept poorer health states at follow-up, reflected in their PASS and TF thresholds. This baseline severity dependency should be considered when interpreting MIC, PASS and TF after first-line treatment for hip and knee OA.
The large sample of patients that all participated in the same first-line treatment program for hip and knee OA is a strength of this study. Some limitations are associated with this study. To provide a valid assessment of the PROMs, the anchor questions used need to correlate with the difference in PROMs between baseline and follow-up [34]. Although the correlations between the anchor questions and the different PROMs were within the recommended acceptable range of 0.30-0.35 [34], many of the correlations were at the lower end which may have affected the result. In the baseline severity subgroup analysis, many of the MIC scores, especially for the 12 months follow-up, were negative for the low severity group, i.e., a deterioration was regarded as an important improvement. This phenomenon was reported [11], and may be attributed to recall bias and/or to a so called "response shift", i.e., the individual may recalibrate, reprioritize or redefine the construct to be assessed over time [35,36]. Patients participating in digital treatment for OA have been reported to be younger, more often female, having higher levels of education and more often still be working, compared to those participating in traditional face-to-face version of the corresponding OA treatment [37]. Approximately 30% of the participants did not complete any of the follows-up and were excluded. While the baseline PROMs were comparable, those excluded were somewhat younger, having somewhat higher educational level and were working to a greater extent than those completing the follow-ups (Table 2). This, together with the lower response rate at 12 months follow-up may limit the generalization of interpretation thresholds reported in this study to the OA population as a whole. The sensitivity analysis including only those who responded to both follow-ups revealed no important changes to the MIC values at 3 or 12 month, indicating that the drop out rate at 12 months had minor influence of the results.

Conclusion
These interpretation threshold estimates improve our understanding and interpretation of outcomes after first-line OA interventions when assessed with the NRS Pain, KOOS-12 and HOOS-12. The baseline pain dependency of threshold estimates identified is important to consider when these values are used to interpret changes in PROMs after first-line interventions in patients with hip and knee OA.

Author contributions
AC contributed to the conception and design of the study, was in charge of manuscript writing and contributed to the interpretation of data analysis. LHI, HN, LSL, MMI and LED contributed to the conception and design of the study, interpretation of data and manuscript writing. AK contributed to the conception and design of the study, was in charge of data analysis and contributed to writing the manuscript. All authors read and approved the final version of the manuscript. AC (anna.cron-strom@med.lu.se) and AK (ali.kiadaliri@med.lu.se) take full responsibility for the integrity of the work as a whole, from inception to finished article.

Role of the funding source
This work was funded by Greta and Johan Kocks foundation and Stiftelsen f€ or bistånd åt r€ orelsehindrade i Skåne.

Declaration of competing interest
AC and LHI report no conflicts of interests related to the content of this manuscript. LSL was a scientific consultant for Arthro Therapeutics AB (the company providing the digital program analyzed in this study). HN and MMI were employed by Arthro Therapeutics AB during 2021-2022. LED is co-founder and chief medical officer at Arthro Therapeutics AB. AK acts as a scientific advisor (7.5% FTE) for Arthro Therapeutics AB.

Acknowledgments
Not applicable.
Appendix A Table 2 Minimal Important Change values for KOOS/HOOS and NRS scores at 3 and 12 months follow-up by baseline severity using the predictive modeling method.