The effect of spinal manipulative therapy on pain relief and function in patients with chronic low back pain: an individual participant data meta-analysis

Background A 2019 review concluded that spinal manipulative therapy (SMT) results in similar beneﬁt compared to other interventions for chronic low back pain (LBP). Compared to traditional aggregate analyses individual participant data (IPD) meta-analyses allows for a more precise estimate of the treatment effect. Purpose


Introduction
Low back pain (LBP) is the leading cause of pain and disability worldwide, and has a major socioeconomic impact [1].Non-pharmacological approaches are the first choice of treatment as the risk of adverse events is lower than with pharmacological approaches [2].One non-pharmacological approach includes spinal manipulation or mobilization, collectively known as spinal manipulative therapy (SMT).SMT is used by a variety of heath care providers such as chiropractors, osteopaths, manual therapists and physiotherapists.
Many systematic reviews and meta-analyses have analysed the effects of SMT and suggest that it is an effective intervention for the reduction of pain and improvement of function [3][4][5].However, recommendations for SMT in international guidelines for chronic LBP are not consistent [6][7][8].Since each guideline development group is using the same evidence, this is likely to be a consequence of differences in how groups approach appraisal and interpretation of the evidence.
One disadvantage of traditional meta-analyses, is that aggregate data are extracted at the study-level and the investigator is dependent upon how the data is analysed and presented.Individual participant data (IPD) meta-analysis circumvents the issues of poor reporting and not correcting for baseline covariates, because the individual data are available, resulting in more precise and potentially, a more valid estimate of the effect.
Our recent systematic review for SMT for chronic LBP [5] reflects some of the potential limitations of traditional aggregate meta-analysis.For example, the authors of included studies used different definitions of LBP, included a few subacute LBP patients, used different frequencies of treatments, and different analytic techniques ranging from a t-test to sophisticated regression models.In an IPD meta-analysis, some of these problems can be resolved.
The specific objective of this IPD meta-analysis was to assess the effectiveness of SMT compared to any other conservative therapy for primary outcomes (i.e.pain and back-related function) and secondary outcomes (i.e.quality of life, recovery, return-to-work, medication use and treatment satisfaction) at one, three, six and twelve months in adults with chronic LBP.

Methods
This study was conducted according to the Preferred Reporting Items of Systematic Reviews and Meta-Analyses for IPD (PRISMA-IPD) guidelines [9] (Appendix eTable 1).The protocol was registered with PROS-PERO (https://www.crd.york.ac.uk/prospero/display record.php?RecordID=25714) and approved by the Scientific Review Board of the Vrije Universiteit Amsterdam and by the Ethical Committee of the VU University Medical Centre Amsterdam (Projectnr.2015.544).
A detailed description of our study design and procedures was published previously [10].The methodology presented here gives a brief overview.

Search methods for identification of new studies
We included RCTs published from the year 2000.We limited this inclusion, because it is difficult to trace authors of older trials, and there is a high probability that these data would not be accessible.More importantly, more recent studies of SMT for low-back pain are of better methodological quality.Therefore, it is unlikely that this delineator will have introduced undesirable bias [11].Studies in the 2011 Cochrane Review which examined the effect of SMT for chronic LBP were included [12].In addition, we updated the search in December 2016 following the same procedure used in the Cochrane review (Appendix eTable 10) [5,10].This was supplemented with reference checking of systematic reviews and meta-analyses, and personal communication.A recent update of the search (May 2018) resulted in the identification of five new trials [5], all of which were small in size and considered to have a high risk of bias.A update search from May 2018 until October 2020 identified five studies [13][14][15][16][17], three of which are small in size.The two large-sized studies of which one examined SMT vs recommended therapies and the other SMT as adjuvant therapy, reported similar results to ours.

Study selection
Type of studies and participants Inclusion criteria.Only randomized clinical trials (RCTs) were included.Studies were included if they recruited adults (≥18 years of age) with chronic (≥12 weeks duration) LBP.LBP is defined as LBP not attributed to a specific pathology (e.g.infection fracture, tumour or radicular syndrome).Participants with diffuse leg pain due to a low-back condition were included as were participants from primary or secondary care.In those studies where a mixed population was included (e.g.subacute and chronic), where possible, we included only those participants with >12 weeks of LBP.Exclusion criteria.We excluded studies that: 1) used an inadequate randomization procedure (e.g.alternate allocation, allocation based on birth date); 2) included participants with LBP and other conditions such as pregnancy or postoperative participants; 3) tested the immediate effect of a single treatment only; and 4) compared the effects of a multimodal therapy including SMT to another therapy or any other study design whereby the contribution of SMT could not be isolated.

Types of interventions
Experimental intervention.Studies of spinal manipulation (i.e.high-velocity low-amplitude techniques) as well as mobilization (i.e.low-velocity low-amplitude techniques) were included.
We based the definition of 'recommended' and 'nonrecommended' interventions on recent international guidelines for LBP from the USA [8], the UK [6], the Netherlands [7] and COST B13 European guidelines [18].We categorized an intervention into 'recommended' or 'non-recommended' when this was consistently stated in at least two of these guidelines.

Types of outcome measures
Primary outcomes were self-reported pain and backspecific functional status.
Secondary outcomes included self-reported health-related quality of life, return-to-work, global improvement (i.e.perceived recovery), treatment satisfaction and analgesic use.

Risk of bias in individual studies
The 13 risk of bias criteria (scored as 'low risk', 'high risk' or 'unclear risk') recommended by the Cochrane Back and Neck group were used (Appendix eTable 2) [19].The risk of bias was conducted by two independent reviewers (SMR, AdeZ).To adjudicate disagreement, a third reviewer (RO) was contacted.
Data of all participants was sought from the authors of the studies fulfilling the inclusion criteria.We extracted study characteristics, patient characteristics, types of outcomes, duration of follow-up, description of experimental and control interventions.

Preparing data for analyses
We first compared the original data with the published data to check for completeness and where necessary and possible, attempted to resolve any discrepancies.All variables were harmonized in a data harmonization platform [10].
All outcomes were pooled following a decision rule (Appendix eTable 5).All pain scores were converted to a 0-100 points pain scale.To allow pooling of different functional status measures, we recoded the individual scores into Z-scores for each separate time point using pooled standard deviations as denominator (Z score = xi−x SD ).Analysing these Z-scores resulted in standardized mean differences (SMD's).To ease interpretation of SMD's, we converted these to a mean difference (MD) for the 24 point Roland Morris Disability Questionnaire (RMDQ), by multiplying the SMD with the population standard deviation (SD) of the studies measuring RMDQ (SD pooled = n i=0 n i = sample size for each trial; S = standard deviation for each trial).For quality of life, physical and mental component scales of SF12 and SF36 were combined.
Other secondary outcomes were all dichotomized (Appendix eTable 5).However, data were often insufficient (less than 3 trials) to perform any analyses for these outcomes.Adverse events were not included in our protocol but we did examine these data.

Data synthesis and analysis
All analyses were based on the intention-to-treat (ITT) principle.Our primary analyses consisted of one-stage IPD meta-analyses for the five main comparisons at one, three, six and twelve months follow-up (see protocol) [10].These chosen intervals are standard follow-up moments for treatment in LBP.We did not examine the effect of SMT directly post-intervention as there was a large variation in duration and frequency of treatments among the studies.Furthermore, many studies contained no follow-up data immediately following the end of treatment.Longitudinal analyses for all time points simultaneously were not performed as the models were deemed too computationally demanding.
Analyses were conducted using a random-effects analysis of covariance model adjusting for baseline outcome using REML (restricted maximum likelihood), where a separated intercept and separate residual variance for each study is specified.Models extended with a separate baseline adjustment term per trial did not demonstrate convergence in most analyses and we omitted them from all analyses [20].
The pooled treatment effect of SMT was estimated using an MD or SMD (for continuous outcomes) or as an odds ratio (for dichotomous outcomes) including the 95% CI.Negative MD for pain and SMD for function favours SMT, while positive MD for quality of life favours SMT.
We did not assess the effects of imputing missing data on outcomes.We addressed the missing outcome data (see results: characteristics of studies).

Subgroup and sensitivity analyses
Subgroup analyses were pre-specified in our protocol [10] and conducted for the following variables: 1) type of clinician (i.e.chiropractor vs other); 2) 'multi-modal' SMT (i.e.SMT delivered alone as opposed to in conjunction with other modalities have limiting or no effect); 3) country where the study was conducted (USA vs other); 4) only chronic LBP participants (some trials included participants with subacute LBP), and 5) only trials with exercise therapy as a comparator.We conducted sensitivity analyses for studies: 1) with low risk of bias on random sequence generation and allocation concealment, 2) with overall low risk of bias (defined as fulfilling six or more of the criteria items); 3) with a followup period of eight weeks (data from eight weeks follow-up analysed with the three months instead of one month and 4) where we were able to reproduce published results.
Furthermore, sensitivity analyses were performed by calculating functional status scores ourselves instead of using the received overall score.Also, we examined the different functional status measures (e.g.RMDQ) and pain scales (e.g.average pain, pain intensity), separately.
Lastly, to examine whether the RCTs included in this IPD meta-analysis were a representative sample of all known RCTs published since 2000, we conducted a two-stage sensitivity analysis wherein we examined the effect sizes of RCTs included in this IPD meta-analysis vs those which were eligible for inclusion, but for which no IPD was available (using published aggregate data) [5].
Assessment of clinical relevance was defined as a small, medium or large effect and based on the recommendations of the Cochrane Back and Neck group [21,22].The overall quality of the evidence for each outcome was evaluated using GRADE [23] adapted for IPD (see Appendix eTable 6).

Identification of trials
In total, 43 RCTs met our inclusion criteria, of which 21 (50%) provided data  (Fig. 1) representing 4223 participants.In three trials, the results differed from the published results for the primary outcomes by more than five percent (i.e. 5 points on a 0-100 point Visual analogue scale, and 1.2 points for the 0-24 RMDQ), which we determined to be a relevant difference.Of these, one trial provided only data from participants who gave consent to share their data [40].For another study, we received data for more participants and longer follow-up than published [38] while for the third study, baseline data were very similar but our results of the analyses deviated somewhat from the published results due to different patient numbers and use of different statistical techniques; This was a small trial (n = 41) and therefore, these deviations were not likely to influence the results presented here [41].
Sample sizes ranged from 21 to 1334 (median = 192; IQR = 45-271).However, some trials included multiple arms, and some included non-chronic LBP patients; Therefore, the sample size for a given comparison should be considered potentially smaller.The included trials varied with respect to the recruitment method, type of SMT technique, number and duration of treatments and type of practitioner (Appendix eTable 3).
Of the 4223 participants, 2249 were randomized to the SMT group and 1974 to the comparison group.Table 1 presented the patient characteristics at baseline for SMT vs recommended interventions.Data for the other comparisons are tabulated in eTable 7 (Appendix).
Missing data for primary outcomes ranged from 11% at one month to 21% at 12 months.The UK BEAM trial provided the largest dataset (n = 1334) and as a result, contributed most to the missing outcome data (50% of the total amount).The UK BEAM authors did not find a difference across randomized groups between responders and non-responders and drop-out appeared to be unrelated to the treatment [43].

Effect of SMT on primary and secondary outcomes: one stage meta-analysis
Negative point estimates of the mean difference (MD) or standardized mean difference (SMD) favours SMT.

1) SMT vs recommended Interventions
Pain and function improved by the end of treatment and this improvement was sustained up to twelve months after randomization for all groups (Appendix eFigs.3 and 4).
Primary outcomes Pain.There is moderate quality evidence that SMT has similar benefit to recommended interventions at all time points (largest difference at three months; Table 2).
Functional status.There is moderate quality evidence that SMT has similar benefit to recommended interventions at all time points (largest difference at one month; Table 2).
A subgroup analysis for SMT vs exercise showed similar results (see Appendix eTable 8).

Secondary outcomes
There is moderate quality evidence that SMT results in a medium reduction in medication use compared to recommended interventions at two of the four time points (largest difference at six months.For all other secondary analyses, there is low to high quality evidence that SMT has a similar benefit to recommended interventions (Table 3).

Primary outcomes
Pain.There is moderate quality evidence that SMT has similar benefit compared to non-recommended interventions at one and six months (largest difference at six months).There are insufficient data for the three and twelve months analyses (Table 2).
Functional status: There is moderate quality evidence that SMT has similar benefit compared to non-recommended interventions at one, three, and six months (largest difference at six months).There are insufficient data for the twelve months analysis (Table 2).

Secondary outcomes Quality of life
There is low quality evidence that SMT has a similar benefit to non-recommended interventions at one and six months (largest difference at six months).There are insufficient data for the three and twelve months analyses (Table 3).

3) SMT vs Sham SMT
The analysis for this comparison was not performed, because we only had data from one study [44].

4) SMT + intervention vs intervention alone Primary outcomes
Pain.There is moderate quality evidence that SMT + intervention has a similar benefit compared to intervention alone at one, three and twelve months and low quality evidence that SMT has a similar benefit to intervention alone at six months (largest difference at one month) (Table 2).
Functional status.There is moderate quality evidence that SMT + intervention has similar benefit compared to intervention alone at one, three and twelve months and low quality evidence that SMT + intervention has similar benefit compared to the intervention alone at six months (largest difference at three months) (Table 2).

Secondary outcomes Quality of life
There is moderate quality evidence that SMT + intervention has similar benefit compared to the intervention alone at one, three and twelve months and low quality evidence that SMT + intervention has similar benefit to the intervention alone at 6 months (largest difference at twelve months) (Table 3).

5) Manipulation vs mobilization
Pain.There is moderate quality evidence that manipulation has a similar benefit compared to mobilization at one month (Table 2).
Functional status.There is moderate quality evidence that manipulation has a similar benefit compared to mobilization at one month (Table 2).
There are no data for the other time points and secondary outcomes.

Subgroup and sensitivity analyses
The results from all one-stage sensitivity analyses suggest similar results for pain and functional status at all time points (Appendix eTable 8).
We found no differences in pain and functional status between RCTs included and eleven eligible RCTs not included in the IPD repository (Table 4 and Appendix eTable 9).The results of the two-stage analysis were comparable with the one-stage analysis.Sensitivity analysis, including studies published since 2016, did not change our results.

Discussion
Our results suggest there is moderate quality evidence that SMT has similar effects as recommended treatments for pain reduction and improved functional status at short-, intermediate-and long-term follow-up.Additionally, there is moderate evidence that SMT has similar effects for pain relief and improvement in function when compared to nonrecommended therapies and when examined as an adjuvant Table 2 Main treatment effects and GRADE summary of findings for all comparisons for the primary outcomes.Regression coefficients (␤) and 95% confidence intervals (CI) of the intervention effects of random-effect models adjusted for baseline using REML (one stage analysis) are presented.Positive difference in effect indicates higher increased quality of health for SMT group compared to the control.MD = mean difference.OR = odds ratio.
a Recovery was classified as 'recovered' if the participant scored more than 50% improvement or were (much) better or had no symptoms.Medication use was classified for those using taking any using medication for LBP, while not taking any medication was classified as no medication use.Return to work was classified as participants had returned to work or if there were no sick days recorded.Satisfaction was classified as 'satisfied with care' if participants were (completely) satisfied or had scores >75%.therapy.We have no results for the SMT vs sham comparison, because we could only include one study.Finally, there is moderate quality evidence that manipulation has similar effects as mobilisation.
Our results are consistent with the recently published aggregate data review [5] and with other recently published systematic reviews [4,45,46].
It is somewhat difficult to interpret these findings, particularly when SMT demonstrates similar effects to recommended and non-recommended therapies or when examined as an adjuvant therapy.This appears confusing and requires explanation.Firstly, most studies we identified examined the effect of SMT vs recommended therapies.In general, these studies were larger, had more data on followup time-points and were of better methodological quality (i.e.low risk of bias) than the studies in the other comparisons.Meaning, these findings were more robust and therefore, we have more confidence in their effect estimate.Even though for all these comparisons, there is generally moderate quality evidence according to GRADE.While there are general guidelines for applying GRADE, there is no consensus.For example, we used a general rule-of-thumb when evaluating 'imprecision' in accordance with what might be considered an 'optimal information size.Applying a more stringent optimal information size would result in lower quality evidence for SMT vs non-recommended therapies or SMT as an adjuvant therapy, but not when applying this criterion to SMT vs recommended therapies (because the latter analyses included more than 1000 subjects).Secondly, categorizing interven-tions into recommended or non-recommended interventions was not always straightforward (e.g.myofascial therapy), and therefore, open for interpretation.While a sensitivity analysis could have helped to resolve this issue, the data were not sufficiently robust to make this possible.Lastly, the categorization of an intervention as 'non-recommended' does not imply that these interventions do not have an effect or are dangerous or ill-advised.While trials whereby patients are 'blinded' (i.e.sham) would help to resolve this issue; in our estimation, no single study was adequately able to do so.An important difference of our IPD analysis compared to traditional aggregate meta-analyses is that we could adjust for the covariates, baseline pain and functional status, and were not dependent upon how these data were reported in the original publications.This has increased precision of our estimates compared to aggregate data meta analyses, but did not lead to a different conclusion for the main effects.
It will be difficult to justify the required financial and participant resources for further trials comparing SMT vs current recommended therapies, as this is unlikely to change our overall conclusions.Others have previously made the same observation with regard to trials of exercise treatment for low back pain.A 2019 IPD meta-analysis of exercise therapy for low back pain has also produced precise estimates for effectiveness [47].Therefore, future studies should focus on cost-effectiveness, optimal dosage, delivery route to minimize side-effects, specificity of the location treated and maximize the non-specific effects of care, instead of reproducing the same type of trials.

Strengths and limitations
The most important strength is the sample size and the diversity of studies, meaning these results are likely to be broadly generalizable to clinical practice.On the other hand, this diversity in studies led to a difference in methodological quality and types of outcomes and covariates across trials and introduced statistical heterogeneity, which can introduce difficulties in interpreting the data.We investigated this diversity with sensitivity analyses and two-stage analyses for primary outcomes at all follow-up measurements, but could not explain the statistical heterogeneity.Our understanding of the effects of SMT would improve if we had an understanding of the aetiology of LBP and how SMT works.
The most important limitation is potential selection bias.We included only 50% of the eligible trials, which is comparable to other IPD studies [47,48].In a two-stage analysis we examined the effect sizes of those that were eligible but did not provide data.Results suggest only small differences between included studies and those for which we were unable to source the original data, indicating that the RCTs included are likely to be a representative sample of all published studies.Also, the range of studies based upon publication date and methodological quality of the studies we included is comparable with the non-included studies in the recently published review [5].Therefore, this facilitates an effective comparison of interventions across trials.Also, our review differs slightly from our protocol with regard to the classification of the comparator.In our protocol, we classified therapies into effective and non-effective, whereas in this review we classified them into recommended and non-recommended therapies.It was thought this would best help translation of findings to clinical practice.This did not affect the reported result, but was more a wording difference.Finally, longitudinal analyses would have provided us with more information on the individual pattern of changes over time.In the future, these can be run, when programs are able to process these large amounts of data.

Implications for clinicians
SMT is similarly effective as recommended and nonrecommended interventions and when added an adjuvant therapy, in reducing pain and improving function in patients with chronic LBP.For patients with chronic LBP, SMT is a treatment option.SMT can be delivered as a standalone therapy, although it is typically offered within the constructs of a broader treatment package, together with exercise therapy or combined with usual care, as is recommended in recent national guidelines for low back pain [6][7][8].This is important because SMT is by nature a passive treatment.Therefore, to prevent inappropriate behaviour and to empower patients to take control of their condition it is vital that practitioners impart evidence-based messages about passive interventions such as SMT.The choice of treatment should be the result of a shared decision-making process, taking patient preferences and clinicians experience and skills into account.No more research is needed to support these recommendations.Further similar research is unlikely to change these conclusions.
Adverse events were often not recorded and when recorded, were measured differently across trials.Consequently, we were not able to pool these data.These data did not provide more information than the adverse events described in our systematic review of aggregate data [5].

Conclusion/clinical implication
Sufficient evidence suggest that SMT provides similar outcomes to recommended therapies for pain relief and improvement of functional status.SMT would appear to be a good option for the treatment of chronic LBP.

Ethical approval:
The study protocol was approved by the Review Board of the coordinating institution (EMGO Institute VU University Amsterdam).The protocol has also been approved by the Ethical Committee (Ref.No. 2015.554) of the VU University.
Negative difference in effect indicates higher estimated decrease in pain or improvements in function for SMT group compared to the control.MD = mean difference of combined pain score on a 0-100 scale.SMD = standardized mean difference of combined functional status score.aPain measured on a 0-100 point scale.bAll studies in the SMT + intervention vs intervention alone measured Roland Morris Disability questionnaire, therefore we use a mean difference.c Based on one small study.

Table 1
Patient characteristics at baseline for groups receiving SMT vs groups receiving recommended interventions (s = 12; n = 2475).
SD = standard deviation; s = number of studies; n = number of participants; * combining categories was not meaningful or no data available.

Table 3
Main treatment effects and GRADE summary of findings for all secondary outcomes.

Table 4
Representativeness of the pooled effects of studies providing data for the IPD study and those not providing data.Two stage analysis; SMT vs recommended therapies.