Improving the Effectiveness of Exercise Therapy for Adults With Knee Osteoarthritis: A Pragmatic Randomized Controlled Trial (BEEP Trial)

Highlights • Exercise is recommended for knee OA but effects are small and reduce over time.• This randomized trial compared 3 physical therapist-led exercise interventions.• 514 adults with knee OA took part, with data collected over a 36-month follow-up.• On average, all 3 groups experienced moderate improvement in pain and function.• There were no significant differences between groups, at 3, 6, 9, 18, and 36 months.

Exercise, including local muscle strengthening exercise and general aerobic fitness, is recommended as "core" treatment for individuals with knee osteoarthritis (OA). 1 Although systematic reviews conclude that exercise is more effective than non-exercise treatment in reducing pain and improving physical function, 2-4 the average effects are modest and decline over time, potentially explained by diminishing exercise adherence. It is unclear whether the effects of exercise for knee OA can be improved by changing the characteristics of the exercise program.
Systematic reviews highlight the importance of individualized exercise, regular exercise, supervision and follow-up, as well as educational and behavioral strategies to enhance exercise adherence. 2,3,5,6 A previous randomized controlled trial (RCT) investigating physical therapy-led exercise for knee OA delivered over an average of 4 treatment sessions, showed that pain reduction was 3 times greater and functional improvement 4 times greater compared with standardized exercise advice. 7 Benefits declined by 6 months followup. A further RCT tested a more intensive physical therapyled exercise program (average of 6 treatment sessions) and showed greater improvements in pain than those observed in the previous trial. 8 Exercise appears to be worth doing but trials are needed that test if outcomes can be improved through greater individualization, supervision and progression of lower limb exercise and whether the effects of exercise can be maintained for longer through changing the focus from lower-limb exercise to overall physical activity in order to improve adherence. 6,[9][10][11] The BEEP (Benefits of Effective Exercise for knee Pain) trial aimed to test whether knee OA related pain and function can be improved by offering these enhanced physical therapist-led exercise interventions.

Methods Design
A 3 parallel-group, pragmatic RCT, prospectively registered with the International Standard of Randomized Controlled Trials Number Registry (ISRCTN 93634563), with embedded health economic evaluation and linked qualitative interviews (both reported separately). 12,13 The trial was approved by the North West Research Ethics Committee in the UK (REC reference: 10/H1017/45). There were no substantial amendments to the methods of the trial. The full trial protocol was published previously. 14

Setting and participants
Participants were recruited from 65 general practices and 5 National Health Service (NHS) physical therapy services in the West Midlands and Cheshire regions of the UK. Adults aged ≥45 years with knee pain and/or stiffness who met the National Institute for Health and Care Excellence criteria for a clinical diagnosis of knee OA, 1 and who were able to read and write in English, willing to participate, able to give full informed consent, and who had access to a telephone (for minimum data collection), were eligible. Patients were excluded if they had alternative diagnoses or serious underlying pathology (eg, inflammatory arthritis); total hip/knee joint replacement on the affected side, on a waiting list for a total knee/hip replacement; exercise interventions contraindicated; received a physical therapist-led exercise program or injection into the painful knee in the last 3 months.
Based on the learning from a pilot study (ISRCTN 23294263), 14 we identified potentially eligible participants in 3 ways: (1) general practice electronic record reviews to identify older adults who had consulted for knee pain in the last 12 months, (2) population survey of older adults registered with participating practices, and (3) older adults referred from general practice to physical therapy for knee pain. Individuals identified from methods (1) and (2) were mailed a brief screening questionnaire. Individuals identified from method 3 were first screened by a member of the physical therapy service team for key eligibility criteria. Those who were eligible and agreed to further contact were mailed trial information, then telephoned by a research nurse to check eligibility, discuss trial participation, and obtain written informed consent. No physical examination was conducted until trial participants' first physical therapy appointment. We therefore anticipated that a small number of participants would be found to be subsequently ineligible.

Randomization and masking
Following informed consent and receipt of the baseline questionnaire, a trial administrator randomized participants using a password protected computer-generated randomization schedule provided by the Clinical Trials Unit. This ensured that research nurses and trial statistician remained blind to treatment allocation. Participants were individually randomized to 1 of 3 treatment groups with a 1:1:1 allocation ratio using random permuted blocks of size 3, stratified by physical therapy clinic. Physical therapists and participants could not be blinded to allocation but research nurses responsible for data collection were blinded to allocation. The statistician remained blind until analysis of 18-month follow-up data (analysis of 36-month data were conducted after unblinding).

Interventions
Treatment was delivered in 5 NHS physical therapy services by 47 BEEP trained physical therapists across 10 treatment clinics. Each physical therapist was trained to deliver 1 of the 3 interventions. Full details of the interventions and the differences between them have been published previously, 14 as has the content and evaluation of the physical therapy training programmes. 15 Standardized case report forms were used to record treatment details.
All patients received a BEEP trial information booklet which included information about the value of exercise and physical activity and simple self-help messages. All patients were instructed in a home exercise program based on best practice guidance about exercise dose, [16][17][18] and that guided participants to continue at home with the same exercise program as that prescribed by their physical therapist. All patients could continue to access usual care in addition to BEEP treatment. 14 The content of the interventions is summarized briefly below.
Usual physical therapy care Usual physical therapy care (UC) consisted of advice and lower limb exercise delivered in up to 4, individual, 1-to-1 treatment sessions over 12 weeks. Exercises were selected from an agreed template of commonly prescribed lower limb exercises (available from the authors on request), including muscle strengthening (non-weight-bearing and weight-bearing), range of movement, or stretching exercises. UC matched usual physical therapy practice in the NHS. 19 Individually tailored exercise Individually tailored exercise (ITE) consisted of a supervised, individually tailored and progressed lower limb exercise program delivered in 6-8 1-to-1 treatment sessions over 12 weeks. The exercise program focused on strengthening, stretching and balance exercise, and functional task training. Agreed and defined functional and exercise goals were reviewed and progressed. Individualization was based on physical therapist assessment findings, including biomechanical and physiological observations, pain responses to specific exercises and starting levels of strength, range of movement, and balance. Participants were given a print-out of their specific exercise prescription (using PhysioTools computer software), which changed over time as the exercise program progressed. Physical therapists encouraged exercise behavior change using self-monitoring via a lower limb exercise diary to record adherence.
Targeted exercise adherence Targeted exercise adherence (TEA) began with a focus on lower limb exercise (as in the ITE protocol) but aimed to support a transition to increasing general physical activity adherence over 6 months. It included 4 individual face-to-face treatments up to week 12, and a further 4-6 follow-up contacts (face-to-face or over the telephone) from week 12 to 6 months (a total of 8-10 treatment contacts). The target by the end of 6 months was that participants would be engaged in physical activity opportunities within their community, having had support from their physical therapist to overcome initial problems or barriers in engaging in these activities. The emphasis was therefore on maintenance of physical activity beyond the period of support from the physical therapist.
In addition to prescribing an individualized, progressed, and supervised lower limb exercise program (as per ITE), physical therapists assessed participants' current general physical activity levels, intentions to increase physical activity, attitudes to exercise for knee pain and general health, and individual barriers and facilitators to exercise. They also helped patients to identify suitable general physical activity opportunities in their local community. Each physical therapist was provided with an "adherence enhancing toolkit" that contained optional educational, behavioral, and cognitive-behavioral tools and techniques for facilitating exercise behavior change (for a summary of the contents of the toolkit, see additional file 4 in the BEEP trial protocol paper 14 ). Specific tools were selected for use with individual patients based on assessment findings.

Outcome measures and follow-up
Outcomes were measured via postal questionnaires, with reminders, at baseline, 3, 6, 9, 18, and 36 months followup. At 6, 18, and 36 months follow-up, minimum data were collected over the telephone by a blinded research nurse.
The primary outcomes were lower limb pain and function measured using the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC), 20 6 months post randomization. Secondary outcomes were the proportion of treatment responders (Outcome Measures in Rheumatology Clinical Trials-Osteoarthritis Research Society International [OMERACT-OARSI] clinical responder criteria); 21,22 self-reported physical activity (Physical Activity Scale for the Elderly (PASE); 23 selfreported use of local physical activity facilities in the previous 7 days (single item); exercise adherence (self-reported adherence to prescribed exercises); self-reported body mass index (calculated from self-reported height and weight); a modified version of a measure of treatment acceptability and credibility; 24,25 illness perceptions (Brief Illness Perceptions Questionnaire); 26 confidence in ability to exercise (Self-efficacy for Exercise Scale); 27 outcome expectations for exercise (Outcome Expectations for Exercise Scale 2); 28 anxiety (Generalized Anxiety Disorder Assessment 7); 29 depression (Personal Health Questionnaire Depression Scale 8); 30 self-reported health care resource use (both NHS and private health care); and overall health related quality of life (EQ-5D-3L), 31 (for health economic analysis, reported separately 12 ). Seven-day accelerometry was also measured in a subsample of participants (n=89) via Actigraph accelerometers (models: GT1M, GT3X, and GTX+).

Sample size and power
A sample of 500 participants was required to detect an effect size of 0.35 for both WOMAC pain and function at 6 months follow-up, 32 with 2-tailed testing, power of 80% and an alpha level of 5%, comparing UC with either ITE or TEA. This allowed for a 20% loss to follow-up. Standard deviations for WOMAC pain and WOMAC function at 6 months follow-up were drawn from a previous trial 8 and estimated to be 5 and 17, respectively.

Statistical analysis
The statistical analysis plan for the BEEP trial has been previously published. 14 Briefly, primary and secondary treatment models were derived separately at each follow-up time-point by comparing ITE and TEA interventions to UC on an intention-to-treat basis, using analysis of covariance or logistic regression as appropriate within STATA (v15). Results were presented as mean or percentage differences, as appropriate, with 95% confidence intervals, after adjustment for age, sex, duration of the knee problem, physical therapy treatment clinic, and the baseline score of the outcome of interest and after missing data were imputed using previously used methods 33 (described in detail in appendix 1). Sensitivity analyses were performed, including per protocol analysis; adjusting treatment models for therapist effects; excluding a priori covariates from the analysis; and not imputing missing data. Accelerometry data were analyzed using the same methods as for the primary and secondary outcomes; however, missing data were not imputed.
The longitudinal trajectories of the WOMAC and PASE scores were modeled by treatment group using generalized estimating equations after adjusting for the a priori covariates previously defined. Exercise adherence was reported descriptively, and the 6-month treatment effects for WOMAC pain and function explored to see if treatment effect differed depending on exercise adherence. This analysis was conducted by including exercise adherence as a main effect, and as an interaction with treatment, in the models for the primary analysis.

Results
Participant flow and characteristics Twelve participants randomized were found to be subsequently ineligible at their first BEEP trial physical therapy assessment (see fig 1), and excluded from follow-up and analyses. Therefore, 514 participants form the dataset for the BEEP trial.
There were no important differences between groups at baseline (table 1). On average, participants had moderate pain and disability, symptom durations of between 1 and 5 years, were overweight and reported low physical activity levels, but were positive about the ability of treatment to help their knee problem and had generally positive expectations about the benefits of exercise.
Primary outcome data were obtained from 457 (87%) of participants at 6 months (157 (89%) in UC, 153 (86%) in ITE and 147 (85%) in TEA, respectively) ( fig 1). Participants lost to follow-up at 6 months had slightly worse baseline WOMAC knee pain and function scores and slightly higher levels of anxiety and depression at baseline than those who returned follow-up data. At baseline, they also were less likely to report having used local physical activity facilities or opportunities in the last 7 days.
No serious adverse events and 4 adverse events attributable to the interventions were reported; 1 in UC (sprained ankle), 2 in ITE (sprained ankle and twisted painful knee), and 1 in TEA (fall while walking). 82 participants experienced muscle soreness or transient increases in pain/aching (UC n=31 (19%), ITE n=33 (20%), TEA n=18 (12%)). Over the 3year follow-up, there was 1 death, which was not attributable to the BEEP trial interventions.

Primary and secondary outcomes
There were no significant differences in the change in WOMAC pain or function at 6 months between UC and either ITE or TEA (see fig 3 and table 3). Longer-term outcomes at 9, 18, and 36 months remained at similar levels to those seen at 6 months and findings were similar with adjustment for baseline imbalance, imputation of missing data, and adjustment for withinphysical therapist clustering (appendix 3). The per protocol analysis also demonstrated no statistically significant differences in the change in WOMAC pain or function scores (table 3). The longitudinal analysis of the mean outcome trajectory for the WOMAC pain and function scores also did not show any significant differences in the mean trajectory by intervention group. There were within group improvements in pain and function in all 3 groups, with most improvement occurring in the first 3 months, but no significant differences between the groups at any time-point.
Analyses of secondary outcomes showed consistent results overall, of no statistically significant differences in the change in outcomes between UC and either ITE or TEA (table 4), albeit the UC and TEA comparison for the PASE score at 9 months is marginally significant. Participants in all 3 groups, on average, reported that they felt they had greater control over their knee problem and were less concerned about their knee problem compared with baseline.  At 6 months, the proportion of participants in all groups meeting the OMERACT-OARSI responder criteria was around 50% and remained relatively stable over the longer-term follow-ups.
Overall, perceived levels of treatment acceptability and credibility were high in all groups and remained so even at 36 months follow-up (table 5). At 3 months, exercise adherence was high in all groups, with 75% or more agreeing or strongly agreeing that they had completed their exercises as often as they had been advised. In the UC group this reduced at 6 months, and then fell below 50% at longer-term followups. Self-reported exercise adherence was maintained at higher levels for longer in the ITE and TEA groups, but differences between groups at the longer term follow-ups (18 and 36 months) were small. Results for the primary outcome at the primary endpoint also did not differ depending on whether the participant reported high or low levels of exercise adherence (appendix 4).
Physical activity, measured using the PASE, and via accelerometry within the subsample (that met the criteria for valid wear time), showed similar small increases in all 3 groups at 6 months but returned to baseline (or below baseline) levels at 36 months. Self-reported use of physical activity facilities in the last 7 days increased in all 3 groups from baseline to 6 months and stayed higher even at longer-term follow-up.

Discussion
The key findings from this large, pragmatic trial are that there were no significant differences between the intervention groups. Results of secondary outcomes and sensitivity analyses, including a treated-per-protocol analysis, support the same conclusion. Thus, there is no evidence that our approaches to increasing individual tailoring of, and targeting adherence to, exercise for adults with knee OA, are more effective than UC for improving pain and physical function. All 3 groups showed improvements in pain and function in line with those reported in previous meta-analyses, 2 and approximately half of participants in all 3 treatment groups were classified as treatment responders.
The lack of significant differences between groups adds to the debate about the mechanisms of effect of exercise for knee OA. [34][35][36] It questions the assumptions that "doing more" lower limb exercise, with greater individualization, exercise progression, and supervision leads to better pain and function. It also questions the assumption that supporting patients to identify and engage with general aerobic physical activities they enjoy leads to greater adherence to exercise and activity, and thus greater improvements in pain and function. The good outcomes, on average, achieved in the usual care intervention group and indeed   Positive subscale (1-5) 3.9 (0.6) 3.9 (0.6) 4.0 (0.6) 3.9 (0.6) Negative subscale (1)(2)(3)(4)(5) Improve exercise for knee osteoarthritis by all 3 groups suggest that other factors may be important. Our nested qualitative research in this trial highlighted potential explanations that include the value of reassurance from physical therapists that exercise was safe, the opportunity to exercise with the support of a physical therapist who could address their concerns about exercise, and the therapeutic relation they develop with the physical therapist. 13 However, there may be several other explanations (and potential trial limitations) for the results; lack of sufficient difference between interventions, and lack of intervention fidelity particularly in TEA. Our trial was designed and delivered within the UK NHS, and thus the decision about the number of treatment contacts was influenced by what physical therapists and their managers perceived would be deliverable, given that current practice is typically an average up to 4 treatment sessions. 19 We protocolized between 8 and 10 treatment contacts in the TEA intervention and, on average, participants received 7. This may not have been sufficient to facilitate long-term behavior change. Within TEA, overall, fidelity was low with only the "simpler" educational and behavioral tools (eg, written education materials, physical activity diaries, and pedometers) being frequently used. The range of tools within the adherence-enhancing toolkit offered considerable flexibility to the physical therapists, which may have inadvertently diluted the intended focus on increasing exercise adherence in this group. This trial also provides no evidence that the interventions we tested lead to greater sustained changes in physical activity levels at longer-term follow-up at 18 or 36 months. While self-reported exercise adherence appeared to stay higher for longer in TEA, physical activity levels had returned to baseline levels (or below) by the longer-term follow-ups, indicating no sustained behavior change 1 year after the end of physical therapy contact. Despite this reduction in adherence, pain and function outcomes did not regress to baseline values. While this is difficult to explain, it could be related to the reassurance about exercise from physical therapists, resulting in patients being less worried about their knee OA and therefore reporting less pain and dysfunction. It might alternatively be explained by patients being recruited into the trial at a time when they are experiencing an exacerbation of symptoms. Overall, the results show that our attempts to increase individual tailoring of, and better target adherence to, exercise for adults with knee OA, were not more effective than usual exercise based care for improving pain and physical function. It is possible that different efforts are needed to sustain longer-term exercise and physical activity behavior.

Comparison with other research
Overall, the proportion of treatment responders in each arm was similar or better, at 6 months, than reported in previous trials, 7,8,37 and were maintained at 36 months follow-up. Interestingly, UC was associated with a higher proportion of treatment responders than in some previous trials. 7 Our results are similar to an Australian trial comparing different exercise approaches (neuromuscular vs quadriceps muscle strengthening) for patients with knee OA. 38 This also showed that while all groups reported improvements in pain and function over NOTE. All figures are based on data after multiple imputation of missing data has been applied (with the exception of the accelerometer data, comorbidity data, data on marital status, employment status, previous exercise experience, and pain at other body sites). WOMAC higher score=worse outcome; PASE higher score=more active; SEE higher score=more confident that exercise can be done; OEE positive and negative subscales higher score=higher expectations that exercise will be beneficial; GAD-7 higher score=more anxious; PHQ-8 higher score=more depressed; IPQR − affects life, higher score=more affected; IPQR − duration, higher score=lasts a longer time; IPQR − personal control, higher score=more control; IPQR − treatment control, higher score=higher belief treatment can control; IPQR − symptom experience, higher score=more symptoms that are more severe; IPQR − concern, higher score=more concerned; IPQR − understanding, higher score=more understanding; IPQR − emotion, higher score=more emotionally affected. * Numbers are N (percentage). y Defined using the pain regions of the Manchester definition of widespread pain. 41 z Includes both prescribed and over-the-counter medications.
x Reverse scored that is, a higher score on the negative subscale indicates higher expectations of the benefit of exercise. || Median (interquartile range).
{ Participants are only included if they have worn the monitor for at least 5 days for 10 hours or more. Valid time is calculated assuming that any consecutive runs of zero count lasting for 60 minutes or more are counted as non-wear.
# An average score was calculated for each participant by averaging the total number of counts across the valid time for which the accelerometer was worn.
** Defined as participants completing 150 minutes each week of moderate intensity physical activity (accumulated in bouts of 10 minutes or more) or 75 minutes of vigorous intensity activity spread across the week (adapted from Regnaux et al 39 ). Bouts calculated using a drop-time of 2 minutes. 40 Missing days of data are imputed using the average of the average count for days where data are present.           NOTE. Figures are presented after imputation of missing data (with the exception of data from the accelerometers) and after adjustment for the baseline score on the outcome of interest (with the exception of the OARSI responder criteria), age, sex, onset of knee problem, and treatment center, unless otherwise stated. WOMAC stiffness higher score=more severe stiffness; PASE higher score=more active; SEE higher score=more confident that exercise can be done; OEE positive and negative subscales higher score=higher expectations that exercise will be beneficial; GAD-7 higher score=more anxious; PHQ-8 higher score=more depressed; IPQR-affects life, higher score=more affected; IPQR-duration, higher score=lasts a longer time; IPQR-personal control, higher score=more control; IPQR-treatment control, higher score=higher belief treatment can control; IPQR-symptom experience, higher score=more symptoms that are more severe; IPQRconcern, higher score=more concerned; IPQR-understanding, higher score=more understanding; IPQR-emotion, higher score=more emotionally affected. Abbreviations: CI, confidence interval; IQR, inter-quartile range; SD, standard deviation. * Participants met the OARSI responder criteria if (a) relative change in WOMAC pain or function was ≥50% and absolute change was ≥20 or (b) at least 2 of the following applied: relative change in pain ≥20% and absolute change ≥10, relative change in function ≥20% and absolute change ≥10 or participants reported they were improved, much improved, or completely recovered on the global assessment of change question. Absolute change (baseline-follow-up score) and relative change (absolute change/baseline score) were calculated after WOMAC measures were scaled from 1 to 101 to avoid dividing by 0 when calculating relative change. 26 y Data not collected at this time point. z Mean differences are presented, despite a skewed distribution for the outcome at the absolute time point, as, when adjusted for the baseline value of interest, model residuals followed a normal distribution.
x Reverse scored, that is, a higher score on the negative subscale indicates higher expectations of the benefit of exercise. || Participants are only included if they have worn the monitor for at least 5 days for 10 hours or more. Valid time is calculated assuming that any consecutive runs of zero count lasting for 60 minutes or more are counted as non-wear.
{ An average score was calculated for each participant by averaging the total number of counts across the valid time for which the accelerometer was worn.
# Model adjusted for baseline only (adjusting for all a priori model covariates gave unstable model results due to the small sample size used for the analysis).

Strengths and limitations
Although this trial had a large sample size, good follow-up rates over 36 months, participation of many physical therapists, and a diverse sample (eg, in terms of comorbidities), it did have potential limitations. In addition to the potential lack of sufficient difference between interventions, and intervention fidelity (particularly in TEA), we did not adjust for multiple testing in our analyses of the 2 primary outcomes and comparisons (UC vs ITE, UC vs TEA). However, given the nonsignificant results, this would not change our conclusions.

Clinical and research recommendations
Although the usual exercise-based physical therapy intervention could be considered best practice, the consistent observation of decline in physical activity in all 3 groups after the end of physical therapy contact suggests that interventions that effectively increase exercise adherence need to be developed and tested. Furthermore, while the trial showed that 50% or more of participants could be classified as treatment responders, this means that up to half did not. Further research that leads to better understanding, and easier identification and prediction, of those patients who do and do not benefit from exercise, and to different types and intensities of exercise, would be useful to better target exercise treatments in future.

*
Total treatment credibility score is calculated as the sum of the 4 treatment credibility questions when each question is coded on a 0-4 scale; higher score=treatment more credible.
Appendix 1. − Strategy to impute missing data Multiple imputation was used to impute missing data at all time-points for outcomes shown in Appendices 6 and 7 (excluding accelerometry), and for the adjusting covariates used in the regression models. Despite not having any missing data, the variable representing the treatment effect was included in the imputation model as this variable was included in the analysis models. An imputation model was fitted using Multiple Imputation by chained equations (MICE) in STATA version 15.0 [1] and assumed that data were missing at random, as evidenced by describing the baseline characteristics of participants with and without data at each time-point. Twenty-five imputed data sets were derived to ensure the number of imputations exceeded the overall percentage of missing data [2]. Continuous outcome measures were modelled using predictive mean matching (nearest neighbours = 1 [1]) and ordinal outcomes were modelled using ordinal regression. Predictive mean matching was chosen so that the imputed values remained on the same scale as their original outcome and because this method is particularly suited to modelling skewed data [2]. All ordinal outcomes were imputed using the "augment" option in STATA to avoid the problem of perfect prediction [1]. The imputation model for the global assessment of change (GAC) outcomes also included the "ascontinuous" option [1]. This option was used to ensure that the imputation model for the GAC outcomes converged. It works by imputing the GAC outcomes using ordinal regression, but when these outcomes are included as predictor variables in the imputation model for other outcomes they are assumed to be continuous variables, rather than categorical, to reduce the number of degrees of freedom in the imputation model. This model simplification was needed to ensure the imputation model converged and was a reasonable assumption given that the GAC outcomes are measured using a relatively large number of the response categories (six in total). After the imputation model had been applied to the data, Rubin's rules [3] were used to combine treatment effects (and their associated standard errors) across the imputed data sets to provide a single estimate of treatment effect for each analysis outcome. a The interaction regression coefficients express the interaction between treatment arm and level of exercise adherence after the outcome of interest has been adjusted in a regression model for baseline in the outcome of interest, age, gender, duration of the knee problem and physiotherapy treatment clinic (as was used in the randomisation algorithm) 53 participants reported they were "not sure" if they had been doing their exercises as often as advised so were excluded from the analysis.