Introduction

Exercise is evident for the management of chronic, non-specific low back pain in therapy and rehabilitation1,2,3,4. In general, strength/resistance and coordination/stabilisation exercise programmes appear to be superior to other interventions in the treatment of chronic low back pain5. Specifically, the effects of motor control exercise therapies on the reduction of pain and disability, as well as on improvements in functional performance, are highlighted in numerous meta-analyses on chronic, non-specific low back pain, as an acute, long term2, and sustainable treatment6. These types of sensorimotor/stabilisation training are the most established therapy forms in low back pain treatment which aim to improve neuromuscular deficits2,5. The use of the following interventions indicate the sensorimotor training principles in the context of chronic, low back pain treatment: motor control, sensorimotor, perturbation, neuromuscular, core stability, stabilisation, Pilates-based stabilisation and instability training. The superordinate principle, musculoskeletal control by afferent sensory/proprioceptive input, central nervous system integration of the afferences and optimal stabilisation to ensure functional dynamic joint stability during perturbative situations, are key components of all the above mentioned training forms7. The meta-analyses on the effects of these training forms2,3,4,8,9 have not pointed out training characteristics (period, duration, frequency, intensity, etc.) for the likely largest effect. The optimal dose for the maximal treatment success-response relationship is, thus, still unknown1,10.

It is evident that the success of exercise interventions in the therapy of musculoskeletal disease (including non-specific low back pain) is dependent on the high adherence of the patients to their therapy plan. Regarding the therapy of chronic, non-specific low back pain, the dose-response relationship between stabilisation exercise interventions and pain reduction is of great interest to policy makers, clinicians and individuals. van Tulder et al.4 reported in their systematic review that a high training dosage (≥ 20 h) is more effective in exercise interventions to improve pain and function in chronic, non-specific low back pain patients. More information on the period, duration, frequency and intensity were not presented. Saragiotto et al.2 reported a wide range in the duration of the applied motor control intervention programmes in the studies included in their meta-analysis of 20 days to 12 weeks. The number of treatment sessions per week ranged from one to five sessions. Consequently, as a result of, inter alia, this variance in training scheduling, a large heterogenity was found in the meta-analyses highlighted above. Decreasing this heterogeneity would, on the one hand, increase the level of evidence of the stabilisation exercises’ effects on low back pain patients. On the other hand, with a much higher impact on clinical and scientific practice, the determination of an optimal dose-response relationship with the thereof derived recommendations on how an intervention needs to be structured in terms of training type, duration, frequency and intensity, is of great relevance. As an impact of a high risk of bias11 and a low sample size12 of the studies included into meta-analyses is known, these potential confounders should be considered in dose-response-analyses, likewise.

The purpose of this systematic review with meta-regressions was to (1) delineate the dose-response-relationship of stabilisation exercises and (2) derive recommendations for the stabilisation exercises’ training specifics that could maximise the reduction of pain and disability in chronic, non-specific low back pain patients.

Methods

The presented systematic review with meta-regression was conducted in accordance with the recommendations of the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA)13.

Literature research

The literature research was performed using the digital peer review-based databases PubMed (Medline), Web of Knowledge and the Cochrane Library. The following Boolean search syntax was applied (example for the PubMed-search): (stabili* OR sensorimotor OR “motor control” OR neuromuscular OR perturbation) AND (exercise OR training OR therapy OR intervention OR treatment) AND ("low back pain" OR lumbalgia OR "lower back pain" OR dorsalgia OR backache OR lumbago OR LBP OR “back pain”).

Two reviewers (JM & DN) independently conducted the literature research. Consequently, the identified studies were screened for eligibility, using firstly the titles and secondly the abstracts. Afterwards, the remaining full texts were assessed for eligibility by applying the inclusion and exclusion criteria (Table 1). A consensus was used to address any disparities; a third reviewer (N.N.) was planned to be asked, if necessary, to address any disparities. After study retrieval, additional studies were identified by manually searching through the reference list (cross-referencing) of the selected articles. The search was limited to full-text availability, publication up to the 30th of March 2020 and in the languages of English or German (Table 1).6

Table 1 Inclusion and exclusion criteria for both the studies and participants.

Inclusion and exclusion criteria

The inclusion and exclusion criteria were defined with respect to population, intervention, control/comparator and outcome (PICO). The detailed criteria for both the participants and studies are displayed in Table 1.

Data extraction

The common effect estimators for pain intensity and disability were retrieved from each study. The intervention group baseline-to-post effects sizes (Cohens d) were calculated as the change in mean values from baseline to post intervention assessment divided by the baseline standard deviation values for the respective scale. All data of interest were retrieved from the individual study data; for this purpose, a data extraction form designed for this review was used. Data on training dose and frequency were retrieved according to the TIDieR checklist. One researcher recorded all the pertinent data from the included articles and the other author independently reviewed the extracted data for its relevance, accuracy and comprehensiveness. A consensus was used to address any disparities; a third reviewer (N.N.) was asked, if necessary, to address any disparities. Authors of those studies included in this review who had not reported sufficient details in the published manuscript, were personally addressed by e-mail requesting the provision of further data. The effect estimators for pain intensity and disability were calculated using either the visual analogue scale (VAS), the numeric rating scale (NRS) or the sum score, inherent of the scale/assessment tool (0–10, 0–24 or 0–100), as the calculation of the standard mean differences is scale independent. For such data, only the direction (lower values mean less pain, less disability) was normalised. For scale-dependent calculations (inverse weighting, calculated as sample size divided by the squared standard deviation of the baseline-to-post difference), z-transformed (0–10) variables were used. Missing standard deviations for the differences were imputed according to the procedure described by Follmann et al.14.

Study quality assessment

The Physiotherapy Evidence Database (PEDro; 11 criteria) scale was used to assess the methodological quality of all trials included. The PEDro scale is a valid and reliable tool to rate the internal study validity and methodological quality of controlled studies15. If available, the validated rating scores of the articles were taken directly from the PEDro database (website; 35 out of 46 articles). If not, both authors evaluated the articles, each criterion was rated as 1 (definitely yes) or 0 (unclear or no); potential disagreements were discussed between the two authors and resolved. Overall, the scale ranges from 0 (high risk of bias) to 10 (low risk of bias) with a sum score of ≥ 6 representing a cut-off score for studies with a sufficient study quality. As study quality was considered as a potential explanator of the effect size homogeneity, all studies, irrespective of the quality, were analysed.

Risk of bias within the studies

The two review authors (JM and DN) independently rated the risk of bias of the outcomes pain and disability in the included studies by using the Cochrane Collaboration’s tool Risk of Bias tool 216,17 . Studies’ outcomes were graded for risk of bias in each of the following domains: sequence generation, allocation concealment, blinding (participants, personnel, and outcome assessment), incomplete outcome data, selective outcome reporting and other sources of bias. For the outcomes, each item was rated as “high risk”, “low risk” or “unclear risk” of bias. Again, any disagreements were discussed between the raters. If a decision could not be reached after discussion, a third reviewer (N.N.), was included to resolve any conflicts. As the risk of bias was (indirectly, via the PEDro sum score) considered as a potential explanator of the effect size homogeneity, all studies, irrespective of the risk of bias, were analysed in the meta-regressions.

Risk of bias across the studies

The calculation of the risk of publication bias across all the studies was indicated by using funnel plots/graphs18. The Review Manager 5.3 (RevMan, Version 5.3, Copenhagen: The Nordic Cochrane Centre, The Cochrane Collaboration, 2014) was used for funnel plotting.

Data processing and statistical analysis

Data was initially plotted using scatterplot diagrams. The type of association between each independent and dependent variable was visually determined. In case of a linear association, data were processed as real values, thus, if a curve-linear association was determined, data were re-calculated using logarithmic transformations (log-association) and, respectively, Taylor-series (U-shaped-associations) to provide linearity for the regression calculation.

Sensitivity meta-regressions for dose-response analyses and the impact of study quality were conducted as described in Niederer & Mueller (2020)6. A syntax for SPSS (IBM SPSS 23; IBM, USA) was used (David B. Wilson; Meta-Analysis Modified Weighted Multiple Regression; MATRIX procedure Version 2005.05.23). Inverse variance weighted regression models with random intercepts (random effect model, fixed slopes model) with the dependent variables of pain intensity and disability effects (simple pre-post Cohen’s ds) and the independent variables: intervention duration [weeks, U-shaped], intervention frequency [number of trainings/week, U-shaped], intervention duration [minutes, logarithmised], intervention total dose [minutes] were applied. The sample size (SE group) and the study quality PEDro sum score [points, linear] were considered as co-factors. Homogeneity analysis (Q- and p-values) and meta-regression partial coefficients B (95% confidence intervals and p-values) were calculated. All statistical analyses were tested against a 5% alpha-error probability level.

Effect estimators’ level of evidence

The quality of the evidence revealed by the meta-analyses was graded using the tool established by the GRADE working group19. Quality evidence was categorised as “very low” (The estimate of effect is very uncertain), “low” (further research is likely to change the estimate), “moderate” (further research may change the estimate) or “high” (further research is very unlikely to change the estimate of effect) (plus interim values). The grading starts with the type of evidence (RCT = high, Observational = low, all other study types = very low) and is decreased or increased based on study limitations, inconsistencies, uncertainty about directness, imprecise data, reporting bias (decreasing items), or strong associations, dose-response findings, and confounder plausibility (increasing items)19.

Recommendations were derived using a clinical guideline developing tool20. Overall, four key factors were applied to determine the strength of the recommendations: Balance between desirable and undesirable effects (larger differences between desirable undesirable effects lead to stronger recommendations)—Quality of the available evidence—Values and preferences (higher variations lead to weaker recommendations)—costs (higher costs lead to weaker recommendations. Details that are more comprehensive can be found in21.

Results

Study selection

The database search was completed in 03/2020. Figure 1 displays the research procedure and the flow of the study selection and inclusion.

Figure 1
figure 1

Research, selection and synthesis of included studies. n, number; Eng, English; Ger, German; WoK, web of knowledge.

Study characteristics and individual studies’ results

Fifty (50) studies were included in the qualitative and in the quantitative analyses. Study characteristics and the main results are displayed in Table 2. For each of the studies included, methodological aspects, participants’ characteristics and key results are presented. Overall, 2,786 participants, thereof n = 1,239 stabilisation exercise participants, were included in the analysis.

Table 2 Study quality (Pedro scale) and risk of bias assessment.

All included studies adopted a randomised controlled design (RCT). The main inclusion criterion was (chronic) non-specific low back pain ≥ 4 weeks22, ≥ 6 weeks23, ≥ 7 weeks24, ≥ 8 weeks25,26,27, ≥ 12 weeks28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55, ≥ 24 weeks56,57,58 and ≥ 2 year history59, whilst in 1160,61,62,63,64,65,66,67,68,69,70 studies this information was not presented. The baseline pain, effect sizes (Cohen’s d, stabilisation exercise group only) for pain and disability are presented in Table 3.

Table 3 Study characteristics (left columns) and the individual studies’ results (right columns). For each of the studies included, the methodological aspects, participants’ characteristics and key results are displayed.

Study quality and risk of bias within studies

Both the study quality and risk of bias ratings are presented in Table 2. The overall study quality ranged from 3/10 to 9/10 points, with a mean of 5.7 ± 1.4 points on the Pedro scale.

Individual studies' training characteristics

Table 4 summarises the individual studies’ training characteristics. All interventions and the comparators are described. The stabilisation exercises are called core stability exercise25,27,30,31,32,47,51,54,61,62,63,69, motor control exercise24,35,38,44,45,48,50, stabilisation23,26,28,34,41,52,55,60, lumbar stabilisation exercise39,46,56,64,67, spinal stabilisation33,37,68, sensorimotor training66,71, trunk stability exercise49,58, Swiss ball stabilisation43,65,70, perturbation training29, sling training53, McGill stabilisation exercise40,57, segmental stabilisation exercise36, neuromuscular exercise22, multifidus muscle retraining59 and Pilates-based exercise42. The intervention period ranged between 269 and 2422 weeks with a mean of 7.0 ± 3.3 weeks. Training frequency ranged from 129 to 1253 times per week with a mean of 3.1 ± 1.8 times; 3 studies24, 55,63 did not report on this information. Mean training time per session was 44.6 ± 18.0 min with a range from 1524 to 90 minutes29,33 (9 studies35,4749,54,62,63,65,67,68 did not report on this aspect). The number of exercises practised per session varied between 235,47,49,54,62,63,65,67,68 to 1829 exercises with a mean of 7.2 ± 3.9 exercises; 13 studies30,32, 35,37,40,44,48,50,52,53,56,58 did not report this information.

Table 4 Individual studies’ training specifications.

The qualitative analysis of the training volume revealed a range of 130,32,35,37,40,44,48,50,52,53,56,58,70 to 1024,44,46,59,60 sets per exercise practiced with a mean of 3.2 ± 2.4 sets, while 2822,25,28,30,31,32,33,34,35,38,39,40,41,43,45,49,50,51,53,54,56,58,63,67,68,69,70,71 studies did not report any details on this aspect. In addition to this, only 2322,25,28,30,31,32,33,34,35,38,39,40,41,43,45,49,50,51,53,54,56,58,63,67,68,69,70,71 studies reported on the number of repetitions per set per exercise, with a range of 623,24,26,27,36,38,42,44,46,47,48,50,51,52,55,57,59,60,61,64,65,66,69 to 3066 repetitions (mean: 13.6 ± 5.6 repetitions per set per exercise). In addition, only 12 studies29,30,42,46,48,57,59,60,61,62,64,65 reported on the systematic use of rests between exercises, ranging from 1529, 30, 42,46,48,57,59,60,61,62,64,65 to 30065 s (mean: 106.3 ± 86.5 s).

Meta-regression analysis

The results of the meta-regressions are highlighted in Table 5. The total variance explanation was 44% for pain and 15% for disability. When all the other predictors were partialized, moderate quality evidence revealed that a training duration of 20 to 30 min elicits the largest impact on the effect sizes (both in pain and disability) of stabilisation exercise training in low back pain patients. Quality of evidence was downgraded due to risk of bias (− 1), downgraded due to imprecise data (wide confidence intervals, − 1), downgraded (− 1) due to (some) uncertainty about directness, and upgraded due to dose-response-relationship (+ 1), upgraded due to: confounders were considered (+ 1).

Table 5 Outcomes of the sensitivity meta-regressions.

More detailed information on the meta-regressions are depicted in Fig. 2. The training period showed no systematic impact on the effect size for pain intensity (Fig. 2A). Training frequency showed an inverted U-shaped association with the effect size (13% variance explanation) (Fig. 2B), training duration showed a logarithmic association with the pain effect size (23% variance explanation; Fig. 2C). Low quality evidence suggested that training 3 to 5 times per week leads to the largest effect of stabilisation exercise in chronic, non-specific low back pain patients. Quality of evidence was downgraded due to risk of bias (− 1), downgraded due to imprecise data (wide confidence intervals, − 1), downgraded (− 1) due to (some) uncertainty about directness, and upgraded due to dose-response-relationship (+ 1).

Figure 2
figure 2

Meta-regression bubble plots for the dependent variable Cohens d (pain), independent variable training period (weeks, A), training frequency (times/week, B) and training duration (minutes, C). The weighting is illustrated by the size of the bubbles.

Risk of bias across studies

The risk of bias across studies (publication bias) is, by means of a funnel plot, highlighted in Fig. 3. It reveals an unclear, but rather low, risk of publication bias.

Figure 3
figure 3

Funnel plot of all studies included. Each first sustainability SMD (standard mean differences and their belonging SE (standard errors) are plotted.

Discussion

This systematic review with meta-regression examined the dose-response-relationship of stabilisation exercise interventions in chronic, non-specific low back pain patients and, thus, derived recommendations for the stabilisation exercises’ training characteristics in this special cohort.

Summary of main results

The main findings of the presented meta-regression are that: (1) moderate quality evidence indicates that a training duration of 20 to 30 min elicits the largest impact on the effect sizes on both pain and disability of core-specific stabilisation interventions in non-specific chronic low back pain patients, (2) low quality evidence advocates that training 3 to 5 times per week leads to the largest effect of core-specific stabilisation exercise in chronic, non-specific low back pain patients with an inverted U-shaped association with the effect size and (3) no systematic impact of the training period (duration of intervention in weeks) on the effect size for pain intensity was found.

Comparison with other evidence

Saragiotto et al.2 reported a wide range of 20 days to 12 weeks in the period of the applied motor control intervention programmes in their meta-analysis. The number of treatment sessions per week varied from 1 to 5. This partly covers the results of our presented meta-regressions. Nevertheless, a detailed analysis on the effect of training characteristics on pain reduction is missing in their systematic review2. The current evidence only proves the use of general and stabilisation exercise (covering sensorimotor, stabilisation and/or core stability) in the therapy of chronic non-specific low back pain2. Regarding the training period/duration (weeks of intervention), our results showed that the duration of intervention (in weeks) presented no systematic impact on the effect size for pain intensity. Taking the current knowledge on the effects and adaptation of sensorimotor training into account, a duration of about six weeks seems to be both feasible and effective. This is in accordance with our quantitative results (mean duration of 7.0 ± 3.3 weeks). However, future research is required to define evidence-based recommendations of this aspect.

Low quality evidence supports an inverted U-shaped association of the training frequency (sessions per week) with the effect size on improvement of pain and disability in chronic, non-specific low back pain patients. The overall relationship between (the amount of) physical activity and low back pain is considered to be U-shaped. This means that both the absence of exercise and extremely high levels of physical activity (elite sports) may lead to an increase in the risk of developing (low) back pain. In contrast, a "normal" (medium) level of physical activity shows the lowest risk and, therefore, appears to be protective2,3,4,8,9. In this context, our findings of adopting a dose of 3 to 5 sessions per week covers this. In addition, moderate quality evidence indicates that a training duration of 20 to 30 min elicits the largest impact on the effect sizes on pain and disability; this may correspond to the patients’ essential need of achieving pain reduction with the minimum effort (time). Nevertheless, this is partly in contrast to van Tulder's result4. They concluded that exercise interventions with a high dosage (> 20 h) have the highest effect. Van Tulder et al.4 fail to point out how this dosage should be applied (duration, frequency). Supported by our findings, it may be more effective to reach this dosage with a high frequency, short bout type of intervention. One of the main reasons of failed treatment success in exercise therapy is the low adherence rate of the patients to their scheduled therapy4. Lack of time and long journey times to the therapy centre are commonly cited barriers to regularly participating in therapy sessions72. Therefore, patients and physiotherapists are constantly searching for the effective dose-response-relationship that could be reduced to the minimum required. Based on our results, we can recommend exercising for more than 2 sessions per week with a minimum of 20 to 30 min per session. Nevertheless, there is still a need for future research on the minimal dosage in the context of stabilisation exercise interventions for chronic, non-specific low back pain patients.

Practical relevance and recommendations

The training-dose and effect-response relationship between core-specific stabilisation exercise interventions and pain reduction or disability improvement in chronic, non-specific low back pain patients is of great interest to policy makers, health insurers and clinicians, as well as the persons affected. This review proved the (low to moderate) evidence, that a core-specific stabilisation intervention of 3 to 5 times per week, 20 to 30 min per session, has a positive effect on pain reduction and improvement of disability in low back pain patients. Conclusively, we suggest the following graded recommendations:

Grade A recommendation: At the group level, stabilisation exercise is likely to be most effective to treat non-specific low back pain when it is scheduled with a time per session of 20–30 min.

Grade C recommendation: At the group level, stabilisation exercise to treat non-specific low back pain is potentially most helpful when it is scheduled three to five times a week.

Future study

Nevertheless, the evidence of more detailed training specifica (training intensity: number of exercises per session, repetitions per exercise, sets per exercise, rest after exercise, etc.) remains unclear. Furthermore, the minimal clinically relevant dosage of core-specific stabilisation interventions in chronic, non-specific low back pain patients remains unclear; this may define a future area of low back pain research as there exists a societal pressure of consistently high low back pain prevalence across all lifespans.

Limitations

Limitations at the study and outcome levels

A common limitation in exercise trials is the limited possibility to blind the participants. This limitation is increased by the self-reported assessment of pain and pain-related function.

Limitations at the review level

We only screened the databases PubMed (Medline), Web of Knowledge and the Cochrane Library. Considering the topic of our review, almost all manuscripts of interest should be found therein73,74,75. However, expanding the search to even more databases, like EMBASE, PEDro, CINAHL; AMED, and CENTRAL may would have led to slightly more hits.

The advantage of meta-regressions are, inter alia, that the interventional effect sizes are compared to each other to find a dose-response-relationship, the effect sizes are thus relativized to each other. The estimates found are valid for the isolated intervention group effects comparisons, given by the meta-regression. The mean effects are, given by the nature of the meta-regression, absolute and not in comparison to a control/comparator. The mean effect sizes (refer to the study description and meta-regressions) are thus not directly comparable to those found in meta-analyses where the effects are calculated in comparison to a control/comparator group.

The funnel plot analysis revealed an unclear, but rather low, risk of publication bias within our review. The findings of our (retrospective) meta-regression should be confirmed prospectively, at best adopting a prospective meta-analysis.

Sensitivity of the interventions’ name

The interventions of the studies included into our meta-analysis are defined as stabilization exercise. Motor control exercises are classically defined as core-specific dynamic stabilization exercises with an a priori education on deep trunk muscles activation and/or the control of deep muscles activation during exercising. We only included studies with dynamic/exercise parts. When solely stabilisation exercises without pre-conditioning are performed, they are often called “coordination”, “stabilisation”5, “sensorimotor”76 or even as well “motor control”2 exercise. As described above, the term “motor control exercise” may be slightly too sensitive for the interventions included into our review. In contrary, the terms “sensorimotor”, “coordination” and “stabilisation” training/exercise may be too general. Consequently, we name the intervention “stabilisation exercise” to highlight that the stabilisation/active/dynamic parts of the originally described as “motor control exercise”-theorem are adopted. Nevertheless, the intervention could also be called “motor control stabilization exercise” or “sensorimotor exercise”.

Conclusions

A training frequency of 3 to 5 times per week (low quality evidence) with a training duration of 20 to 30 min (moderate quality evidence) per session causes the largest impact on the effect sizes (both in pain and disability) of stabilisation exercise in low back pain patients. However, the training period showed no systematic impact on the effect size for pain intensity. Future work is required to enhance the quality of the evidence of our findings, possibly focussing on the definition of a minimum dosage.