Manipulation-induced hypoalgesia in musculoskeletal pain populations: a systematic critical review and meta-analysis

Background Manipulation-induced hypoalgesia (MIH) represents reduced pain sensitivity following joint manipulation, and has been documented in various populations. It is unknown, however, whether MIH following high-velocity low-amplitude spinal manipulative therapy is a specific and clinically relevant treatment effect. Methods This systematic critical review with meta-analysis investigated changes in quantitative sensory testing measures following high-velocity low-amplitude spinal manipulative therapy in musculoskeletal pain populations, in randomised controlled trials. Our objectives were to compare changes in quantitative sensory testing outcomes after spinal manipulative therapy vs. sham, control and active interventions, to estimate the magnitude of change over time, and to determine whether changes are systemic or not. Results Fifteen studies were included. Thirteen measured pressure pain threshold, and four of these were sham-controlled. Change in pressure pain threshold after spinal manipulative therapy compared to sham revealed no significant difference. Pressure pain threshold increased significantly over time after spinal manipulative therapy (0.32 kg/cm2, CI 0.22–0.42), which occurred systemically. There were too few studies comparing to other interventions or for other types of quantitative sensory testing to make robust conclusions about these. Conclusions We found that systemic MIH (for pressure pain threshold) does occur in musculoskeletal pain populations, though there was low quality evidence of no significant difference compared to sham manipulation. Future research should focus on the clinical relevance of MIH, and different types of quantitative sensory tests. Trial registration Prospectively registered with PROSPERO (registration CRD42016041963).


Introduction
Background Spinal manipulative therapy (SMT) is commonly utilised by patients seeking relief from spinal pain symptoms [1]. However, the neurophysiologic mechanisms of SMT and the reasons for positive clinical outcomes in some patients is poorly understood. Manipulation-induced hypoalgesia (MIH), a reduction in pain sensitivity following SMT, is one possible explanation. To date, much MIH research has been performed on asymptomatic populations and its clinical relevance is unknown.
Experimental pain research commonly involves quantitative sensory testing (QST). QST comprises a controlled nociceptive stimulus and standardised psycho-physical measurements of the resulting pain [2]. The most widely used QST type in manual therapy research is pressure pain threshold (PPT, which is the detection threshold for deep pain from pressure), though temporal summation (the change in subjective pain intensity during repeated nociceptive stimuli, typically using heat or pinprick), thermal pain detection thresholds, and others, are also used.
It is known that many people with chronic pain have increased pain sensitivity in a variety of QST measures [3,4]. Clinically, there is poor correlation between pain detection thresholds and subjective pain outcomes (pain intensity and disability), but fair correlations with subjective pain outcomes for pain tolerance thresholds and temporal summation evoked by heat [5].
SMT encompasses a variety of techniques, and sometimes mobilisation is included in the definition. For this review, we are specifically concerned with high-velocity low-amplitude (HVLA) SMT, which involves a rapid, controlled manual thrust targeting specific spinal joints [6]. The thrust is often accompanied by a "cracking" sound (termed cavitation) [6], though hypoalgesia appears to occur regardless of a cavitation [7]. In the following, SMT will refer to HVLA SMT unless specified differently. The mechanism for clinical pain relief associated with SMT is not well understood, though changes in QST measures may offer insight into this. Bialosky et al. [8] argue that the mechanical input of SMT (and manual therapy in general) leads to a neurophysiologic cascade. This may involve peripheral factors (e.g. changes in inflammatory mediators and nociceptors), spinal factors (e.g. altered dorsal horn neuron excitability), and supraspinal factors (e.g. periaqueductal gray activation). Furthermore, it is widely accepted that at least some of the pain relieving effect of SMT is attributable to placebo and contextual factors [8,9].

Previous research
Four previous systematic reviews on the topic of MIH conclude that SMT (and mobilisation) leads to increased PPTs (decreased pressure sensitivity) [10][11][12][13]. One concludes that this increase in PPT was significant compared to sham in asymptomatic populations [13], and the others do not make conclusions on this topic. A variety of other types of QST measures may also respond to SMT [12], though results for temperature-induced pain are mixed [11,12]. There is no clear consensus in these reviews regarding whether changes in QST occur only locally, regionally, or systemically in relation to the site of SMT. None of these reviews, however, specifically investigate changes in QST measures after HVLA SMT in musculoskeletal pain populations only.

Rationale and research questions
Since previous reviews do not adequately address whether MIH occurs in symptomatic populations, and the MIH literature has expanded significantly in recent years, we concluded that an up to date systematic critical review with meta-analysis was warranted. Our overarching aim was to investigate the literature on how HVLA SMT affects short-term QST measures in musculoskeletal pain populations, with the following specific research questions:

Is there a difference in change in QST measures
after SMT compared to sham or control? 2. Is there a difference in change in QST measures after SMT compared to active interventions? 3. Do QST measures change over time after SMT? 4. Are any changes in QST measures after SMT local, regional, or remote?

Methods
This review was prospectively registered with PROS-PERO, registration number CRD42016041963.

Eligibility criteria
We included only peer-reviewed randomised controlled trials in English, investigating change in any QST outcome before and after SMT, in human participants with musculoskeletal pain of any type and duration. We considered any year of publication, but arbitrarily limited studies to those with at least 10 participants per group in order to reduce the effect of spurious findings from particularly small studies. At least one group in each study had to receive HVLA SMT, not combined with any other therapy. SMT could be compared to any other active intervention, sham, or control. Studies had to measure at least one type of QST as a primary or secondary outcome measure, before and after the intervention on the same day.

Data sources and searches
Studies were identified through a comprehensive literature search of the databases PubMed, Scopus, and CINAHL from inception until December 8, 2016. An additional search was performed on September 21, 2017, to identify any additional articles published in the interim. A manual search of the reference lists of included articles was used to identify any further relevant studies. Since the language used to describe SMT and QST outcome measures is highly variable, we compiled an extensive list of relevant terms. Both lists were used in each search, joined with the Boolean operator ' AND'. Terms for SMT were: Terms for QST outcomes were: "pain perception" OR "pain sensitivity" OR "experimental pain" OR "experimental pain sensitivity" OR "experimentally induced pain" OR "experimentallyinduced pain" OR "quantitative sensory testing" OR "pain measurement" OR "pain tolerance" OR "pain threshold" OR "pressure pain" OR "pressure pain threshold" OR "pressure sensitivity" OR "pressure pain sensitivity" OR "thermal pain" OR "mechanical pain" OR "exercise-induced pain" OR "electrical pain" OR "chemical pain" OR "pain modulation" OR "analgesia" OR "analgesic" OR "hypoalgesia" OR "hypoalgesic" OR "hyperalgesia" OR "allodynia" OR "algometry" OR "algometer" OR "temporal sensory summation" OR "temporal summation" OR "wind-up" OR "suprathreshold heat response"

Study selection
Titles and abstracts of search results were screened independently by an author (SA) and an independent reviewer (an academic health professional who was trained in the topic prior to the review) to identify articles for full text retrieval. Full text articles were retrieved based on reviewer agreement, and were screened independently, but unblinded, by two authors (SA and CLY) for inclusion. Disagreements were resolved by consensus between reviewers at each stage, with arbitration by a third author (BW) if required.
Data extraction and quality assessment Data were independently extracted by two reviewers from the full text of the included studies using descriptive (SA and CLY), quality (SA and CLY), and results (SA and BW) checklists. The checklists were developed by consensus of two authors (SA and CLY) based on the needs of this review, and were pilot tested on two articles and refined. Any disagreements were resolved by consensus of the two relevant reviewers, and arbitration by a third author if required.
The descriptive information of interest was: 1) study design (number and size of groups, randomised controlled trial or cross-over design), 2) participant information (mean age, age range, sex distribution, type of musculoskeletal pain, source of participants), 3) details on SMT intervention (location, if therapist was allowed to choose target joint, if 2nd thrust was allowed), 4) comparators, and 5) QST outcome measures (type, measured where and when), and 6) area. We used the term 'area' to describe the location of the QST measurement in relation to the location of the SMT, considering anatomical and neurological connections. The subgroups used are local, regional, and remote, defined in Table 1. These are based on dermatomal and myotomal patterns, and acknowledging that SMT lacks specificity, affecting multiple joints in the vicinity of the target joint [14,15]. Convergence of trigeminal nerve and upper cervical afferent inputs in the upper spinal cord has been demonstrated [16], thus we chose to classify the head and face as 'regional' in the case of upper cervical SMT.
Any specific types of QST measured in less than three studies, and studies that measured only these types, were excluded from quality and results tables and from meta-analysis, as conclusions would be difficult to make based on one or two studies. These studies were included in the descriptive table and are briefly discussed in the Results section.
Quality items were based on risk of bias items from the Cochrane Handbook for Systematic Reviews of Interventions [17] and the PRISMA Statement [18]. This approach was deemed appropriate since we were reviewing experimental rather than clinical outcomes. Articles were assigned quality scores out of 12 (maximum one point per item). The quality items, interpretation details, and scoring system are detailed in Table 2.
Two working results tables were constructed, for within-group change and between-group differences respectively (not presented). Within-group results are reported for SMT groups only. PPT is the only QST measure reported in the results tables, and is reported as actual and percentage change from baseline. Percentage change is helpful for interpretation, since absolute values can vary widely based on testing location [19]. Between-group results are reported as the difference in the mean change between SMT and comparator groups. Data were converted to kg/cm 2 where relevant, and we calculated change in PPT and between-group differences based on data presented in the full text of the study (when required and if possible). Included in these tables were relevant descriptive information, statistical significance of results, and the articles' quality score.

Data synthesis and interpretation
The working results tables were colour coded based on the statistical significance of the results (alpha level .05). These working tables were used systematically to answer the research questions. These data are presented as a single results table without colour coding in this article, Table 5. Studies are ordered in the results table based on quality (highest to lowest). Since items related to risk of bias are included in the quality table, risk of bias was not considered separately. During the interpretation, we assessed whether studies of lower quality generally agreed or disagreed with studies of higher quality. This assisted in determining the weight to place on a result.

Meta-analyses
Meta-analyses were performed with Comprehensive Meta-Analysis V3 (Biostat, Inc., USA) software, using mean change from baseline and standard deviation (SD) of the change in each group. If an intervention group in a single study had two or more testing sites eligible for inclusion in a given meta-analysis, a combined mean change and variance was calculated, as recommended in Borenstein et al. [20]. This was in order to account for lack of independence with multiple outcome measures. The calculation for combined variance requires assuming a correlation between the outcomes being combined. Since we could not identify any published estimates of the correlation between PPT at different testing sites, we chose to run a sensitivity analysis by calculating the variances twice, based on high and low assumed correlations (0.75 and 0.25 respectively), and compare meta-analysis results. Studies were not excluded based on quality scores. Given the heterogeneity in testing sites, specific interventions, and study populations, analyses were run under a random effects model. Heterogeneity was assessed with I 2 , with > 75% indicating considerable heterogeneity between studies [17].
If at least three studies included in the meta-analyses utilised repeated measures post-intervention with at least 15 min follow-up, we planned to analyse the data by groups of time points as appropriate, based on the spread of these time points. Otherwise we intended to use the first post-intervention measurement for all studies. We planned to perform the following meta-analyses: 1. Mean change from baseline of all SMT groups and testing area a. Subgroup analyses: mean change from baseline for each testing area (local, regional, and remote) 2. Difference between SMT and each type of comparator (minimum three studies) Numerous studies did not provide SDs of the change, but provided other data that allowed calculation of SDs as follows. If mean change from baseline with either 95% confidence intervals (CIs) or standard error were provided, SDs of the change were calculated based on formulae from section 7.7.3.2 Obtaining standard deviations from standard errors and confidence intervals for group means of the Cochrane Handbook [17].
Several studies separated results into right and left or ipsilateral and contralateral results at each testing site. Sides were combined by averaging the mean of each side, and using a formula to calculate combined SDs of the change, provided in table7.7.a Formulae for combining groups in the Cochrane Handbook [17]. One study [21] reported data separately for participants who received right and left cervical SMT. These groups were also combined using the same formula, since we were not interested in side to side differences of MIH and the study reported there were no differences between groups.

Results
A total of 1868 records were identified in the initial search, completed on December 8, 2016, with none added after reference list searches. Seventy-three articles were retrieved for full text review, with 14 identified as meeting inclusion criteria. The follow-up search on September 21, 2017, identified one additional article. PPT was the only type of QST measured in three or more studies, thus only PPT was addressed in quality and results tables and meta-analyses. The tables were adapted to specifically suit PPT, excluding two studies did not measure PPT [22,23]. Two further studies were excluded from meta-analyses due to insufficient data reporting [24,25]. See Fig. 1. Table 3 contains a full description of the 15 included studies' characteristics.

Populations
The weighted mean age of participants was 35.9 years (mean age range 23-46 years), with a total of 901 participants across all studies. Of these, 600 (66.7%) were female. Two studies included only female participants [24,26].

Other factors
How studies determined the target vertebral joints for SMT was variable. Nine studies pre-defined the target joint [22-24, 26-28, 30, 31, 34], while three allowed the treating practitioner to choose a target joint (within a region) based on history and examination findings [21,25,29]. One study used each approach for each of two SMT groups [32], and two studies did not report adequately on this topic [33,35]. Nine studies allowed practitioners to deliver a second SMT thrust if a cavitation was not achieved on the first thrust [21, 24, 26, 28-31, 33, 34]. Three studies delivered a fixed number of thrusts [22,23,35], and three did not report on this [25,27,32].

Quality of studies
See Table 4 for quality items and scores for the 13 studies measuring PPT. Articles were scored out of 12.
Mean and median quality scores were 6.6 (SD 1.2) and 6.5 respectively, with a range of 4.5-8. We chose to group articles post-hoc as lower, moderate, and higher quality, arbitrarily using scores of 4.5-5.5, 6-7, and 7.5-8 respectively as cut-points to assist discussion.

Answers to research questions
Is there a difference in PPT comparing SMT to sham, or SMT to control? See Table 5 for results from the 13 studies measuring PPT. Out of four sham-controlled studies, two found no significant differences in change in PPT between SMT and sham groups. One of these was higher quality with confirmed blinding of participants [35], and the other was moderate quality with attempted but unconfirmed blinding [24]. The remaining two studies found    Technique ✔, not pre-tested (0.5) Yes (1) Yes (1) Yes (1) Yes (1) Were naïve (1) Yes (1) NR (0) Yes (1)   8 Coronado

NR (0)
Yes (1) Calculation based on subjective pain intensity (0) Yes (1) NR (0) No (    Right and left cervical SMT groups' data combined by reviewers c Significance inferred from 95% confidence interval that does not cross zero a significant increase in PPT after cervical SMT compared to sham [26,34]. These were both of lower quality, with no reported attempt to blind participants. Three studies were included in the meta-analysis comparing change in PPT between SMT and sham (N = 92). There was minimal difference in results of the meta-analysis whether we assumed a correlation of 0.75 or 0.25 for calculations of combined variance. We will discuss results based on the more correlation of 0.75, which gives slightly more conservative results. There was no significant difference in the mean change in PPT after SMT compared to sham (0.41 kg/cm 2 , CI -0.09 -0.91). See Fig. 2 for a forest plot. Full results for all meta-analyses (including sensitivity analysis) are reported in Table 6.
Two studies compared SMT against control. The higher quality study found no significant difference in change in PPT between two SMT groups and a control condition with manual contact to the head [27]. The moderate quality study found a significant increase in PPT after cervical SMT compared to a control of quiet sitting, at two of three testing sites [30].

Is there a difference in PPT comparing SMT to mobilisation or other therapy?
Three studies compared changes in PPT between SMT and mobilisation. Each found no significant differences between groups. One study was moderate [31] and two were lower [25,29] quality. One of the three studies provided insufficient data for use in meta-analysis [25], thus a meta-analysis was not performed.
Four studies compared changes in PPT in two different SMT groups, with no significant differences between SMT comparisons. One study compared two different HVLA techniques in the same spinal region [28], and three studies compared SMT in different regions of the spine [21,27,32]. Three studies were higher [21,27,28] and one moderate quality [32]. These studies were too heterogeneous for meta-analysis.
There were no significant differences in PPT comparing SMT against extremity manipulation and against exercise, in a single higher quality study [33].
Meta-analysis (N = 693) (based on combined variances calculating with a correlation of 0.75) revealed that the mean change in PPT from baseline in all SMT groups and all testing locations was 0.32 kg/cm 2 (CI 0.22-0.42) with p < .001. See Fig. 3 for a forest plot, and Table 6 for full results. The mean baseline PPT (factoring in relative weightings as in the meta-analysis) was 2.94 kg/cm 2 , giving a mean increase of 10.9% in PPT over time after SMT.
Are any changes in PPT local, regional, or remote?
There were six studies with a total of eight local PPT tests. Four studies observed a local increase in PPT following SMT. These studies were of higher [21,28], moderate [30], and low [29] quality. Two studies observed no significant change in local PPT following SMT. One study had moderate [32] and one lower [25] quality.
There were nine studies with a total of 13 regional PPT tests. Five studies observed a regional increase in PPT after SMT. These were of higher [21,33], moderate [30], and lower [26,34] quality. One study of higher quality observed an increase in PPT at one out of four testing sites [27]. Three studies observed no regional change in PPT after SMT, of higher [35], moderate [32], and lower quality [25].
Eight studies tested PPT at a total of 22 remote sites. Five studies observed a remote increase in PPT after SMT. These were of higher [21,28,33] and moderate [31,32] quality. One higher quality study observed an increase in PPT at one site but not at three others [27]. Two studies did not observe a remote change in PPT after SMT. These studies were higher [35] and moderate [24] quality, one with confirmed and one with attempted blinding. We saw no relation between study quality and result. Meta-analyses revealed the mean change in PPT from baseline after SMT in local, regional, and remote areas to be 0.26 kg/cm 2 (CI 0.11-0.41), 0.35 kg/cm 2 (CI 0.18-0.52), and 0.37 kg/cm 2 (CI 0.23-0.52) respectively, all with p ≤ .001 (based on correlation of 0.75 for combined variance calculation). Five studies could be included in the local subgroup, eight in the regional subgroup, and seven in the remote subgroup, with N = 383, N = 533, and N = 561 respectively. See Fig. 4, and Table 6 for full results.

Additional observations
Significant changes in PPT over time were not isolated to any one category of chronicity, and occurred in neck pain and extremity pain populations but inconsistently in low back pain populations. Studies investigating cervical SMT consistently demonstrated a significant increase in PPT over time, which was inconsistent after thoracic SMT and did not occur after lumbar SMT (regardless of musculoskeletal pain site).
Two studies measured PPT at multiple same-day follow-ups. A higher quality study observed that PPT increased from the immediate post-intervention to the 20 min follow-up, in two SMT groups [28]. A lower quality study had no consistent pattern over three short-term follow-ups [25].

Other types of quantitative sensory testing
Two studies measured temporal summation, one of which found that temporal summation decreased after lumbosacral SMT over time and compared to two exercise groups, in the lower extremity but not the upper extremity [22]. The other study did not analyse or report the post-intervention temporal summation data [33]. Suprathreshold heat response  was measured in a single study [23], observing a significant decrease following lumbosacral SMT compared to sham and control conditions. Four studies investigated five other types of QST, including heat pain threshold [33,34], cold pain threshold [34], Aδ "first" pain [22], suprathreshold mechanical pain sensitivity [23], and aftersensations [23]. They all observed no significant change after SMT over time or compared to another intervention. Quality was not assessed in these studies.

Summary
To our knowledge this is the first systematic review studying the literature on changes in QST measures after HVLA SMT in populations with musculoskeletal pain. Our results indicate that PPT increases systemically over time following SMT in musculoskeletal pain populations in the short term. However, there was no significant difference when compared to sham manipulation. Based on a few studies, there were also no differences between SMT and control or other interventions, which included mobilisation, exercise, and other types of SMT. There were too few studies investigating other types of QST to make robust conclusions.

Explanation and comparisons Effect of spinal manipulative therapy on pressure pain threshold
There is low quality evidence that SMT does not provide an increase in PPT beyond that observed after a sham manipulation. With a sample size of 92 and only three studies included in the SMT versus sham meta-analysis, it is possible that the meta-analysis is underpowered and at risk of producing a false negative result. We also acknowledge that the sham manipulations could technically be considered as low-grade sustained mobilisations. It is therefore possible that they may elicit a neurophysiological response in their own right, which would confound the results. However, in three of four sham groups there was no increase in PPT after intervention, suggesting minimal placebo effect occurred in these studies. Based on these findings, it is difficult to speculate on whether the significant change over time after SMT observed in the included studies is due to treatment-specific effects or non-specific effects (expectation and contextual factors involved in the delivery of SMT). The systemic nature of the change over time would suggest that a central and systemic hypoalgesic mechanism may be at play, rather than local or regional. It is also important to note that few studies measure beyond 5-10 min post-intervention, hence we could only investigate very short term change in PPT, further limiting the clinical applicability of our results. Heterogeneity in the meta-analyses was high, reflecting the significant between-study variation (in populations, QST locations, and interventions). This suggests that there are real differences in effect sizes between studies.
Our review agrees with prior reviews on the topic of MIH for PPT change over time, and builds upon their conclusions by offering support for the systemic nature of the change [10][11][12]. Our SMT versus sham results are in contrast to the review by Honoré et al. [13], which concluded in favour of a specific treatment effect in asymptomatic populations. In attempting to compare our meta-analysis results with those of Coronado et al. [10], we noted that their meta-analysis encompassed all between-group differences, with comparators including active, sham, and control interventions. The difference is reported as Hedges' g = 0.32, which is a small effect size. It is difficult to compare against our results, since we consider their meta-analysis inappropriate given that the comparators are highly heterogeneous.
Changes in PPT are most consistently demonstrated after cervical SMT and fairly consistently after thoracic SMT. Both lumbosacral SMT studies showed no change, agreeing with the findings of a review in asymptomatic populations [13]. It is possible that changes over time in PPT do not occur after lumbosacral SMT. The changes over time are not isolated to particular musculoskeletal pain populations, appearing to occur regardless of chronicity and spinal or non-spinal pain site.

Clinical relevance of change in pressure pain threshold
The clinical relevance of PPT is an important consideration. It is pertinent to note that statistically significant changes in many short term outcome measures following manual therapy are common and should be interpreted cautiously, since they don't necessarily relate to clinically important outcomes over meaningful time periods [36]. Articles have stated values for clinically relevant change in PPT of 15% [11] and 1.1 kg/cm 2 [37], but on inspecting references neither of these are based on the relationship between change in PPT and change in clinically relevant outcome measures. The origin of the 15% value [11] cannot be traced to any provided references, and the 1.1 kg/ cm 2 value [37] was calculated via a distribution-based method for estimating clinically importance difference [38], using effect sizes and standard error of the mean of PPT. There is some evidence, however, that PPT is responsive to change in symptoms, especially to rule in change, based on a study in which change in PPT at the upper trapezius (particularly a change of over 0.86 kg/ cm 2 ) had high specificity and moderate sensitivity for concurrent change in neck pain over 1 week follow-up [39].
In the absence of a clearly defined and valid minimum clinically important difference, it is valuable to consider the minimum detectable change in PPT, which is the minimum change that would be greater than measurement error or chance. This has been calculated as between about 0.5 and 3.4 kg/cm 2 (20-50% change) for PPT [19,[39][40][41]. The change over time results in our review (0.32 kg/cm 2 or 10.9%) are clearly less than the proposed minimum detectable change. Therefore, we cannot rule out the effect of measurement error and chance on the results.

Effect of spinal manipulative therapy on other types of quantitative sensory testing
Our review found single studies for each of temporal summation and suprathreshold heat response that observed a significant reduction following SMT, compared to exercise and sham respectively. The review by Millan et al. [12] comments that temporal summation does not change after SMT. However, on inspecting the three temporal summation studies they included (one of which was also included in our review), temporal summation was reduced after SMT in each study, suggesting a mistake on the authors' part. Changes in temporal summation after SMT may be worth further study.
Five other types of QST, including thermal pain thresholds, did not change over time or compared to other interventions. Two prior reviews also conclude that there are no changes in thermal pain thresholds [11,12], based on a total of seven unique studies with some overlap between reviews. Thus it appears likely that thermal pain thresholds do not change after SMT.

Methodological considerations for this review
We consider it a strength that a comprehensive literature review revealed a large number of studies that fit our criteria, and we were able to perform quantitative analysis to complement the qualitative review. Our quality assessment tool was developed to fit our specific research questions. This may be considered as both a strength, since only relevant items were considered, and also a weakness, since the tool is not standardised. However, there was no standardised tool that fit our needs. We also acknowledge concerns regarding the use of summary scores for assessing study quality, hence we did not exclude studies based on quality but used it as a guide to interpretation.

Methodological considerations for included studies
Several pertinent quality-related items were addressed poorly in the included studies. Firstly, we have concerns about the sham interventions. Only two studies [24,35] reported attempting to blind participants, one of which confirmed that blinding was effective [35]. All four sham-controlled studies [24,26,34,35] used a sham that involved holding the participant in a pre-manipulative position, but without a thrust or joint tension. This would account for some, but not all, of the factors proposed by Puhl et al. [42] as important for sham manipulation. They suggest that, in order for a sham manipulation to be convincing, consideration should be given to replicating (or concealing) the physical contact between patient and practitioner, the motions induced during the procedure, the thrust, and the sound (cavitation). Thus it is questionable whether expectation effects were effectively accounted for in three of the four studies. Inadequate control for placebo effects increases the likelihood that results would favour SMT, though we found no significant difference in PPT between SMT and sham in the face of this.
Worryingly, all but one study failed to report on missing data and subsequent imputation methods. Two studies [24,25] also failed to report adequately on within-group change over time results. Sample size calculations were inadequate in numerous studies; only six had an appropriate sample size calculation based on PPT estimates and met power. We suggest consulting the following article and corrigendum for sample size calculations for PPT [43,44]. Few studies noted whether study participants were kept naïve to the study aims, which may be valuable in reducing expectancy effects.
A single study controlled statistically for psychosocial factors (e.g. anxiety, depression, pain catastrophizing) [29]. The influence of psychosocial factors on QST measures is disputed; they have been shown to be both relevant [22,39,[45][46][47] and irrelevant [22,39,47] in various situations. They may be especially pertinent in clinical populations, though randomised controlled trials with QST tend to have poor external validity, thus psychosocial factors may have different importance in these types of trials. We suggest that researchers consider administering psychosocial questionnaires, allowing them to see if statistical control is appropriate based on their data.
We are pleased that all studies utilised assessor blinding and that most studies reported losses and exclusions and between-group differences appropriately.

Recommendations for future research
There are various limitations to the present studies on MIH, and further studies may shed additional light on the topic. The significant heterogeneity between studies is problematic, as is the lack of quality sham-controlled studies. As Millan et al. [12] suggested, there is a need to focus on more specific research questions in MIH research. We suggest one of the next critical steps is to determine the clinical relevance of MIH. Do changes in QST after SMT relate to clinical features and, more importantly, clinical outcomes for patients? If not, then we can presume that while MIH may represent some specific or non-specific neurophysiologic response, it does not in itself explain the positive clinical outcomes commonly seen after SMT. With these points in mind, Table 7 contains a list of recommendations for future research on MIH.

Conclusion
We considered the articles to be generally of low quality. We found systemically increased pressure pain thresholds (reduced sensitivity) over time after SMT of roughly 10% in musculoskeletal pain populations. There was low quality evidence of no difference in PPT after SMT compared to sham manipulation. There were insufficient studies comparing SMT with other interventions and with other types of QST to make further robust conclusions. We make several recommendations for future MIH research. In particular, research into the clinical relevance of MIH, and different types of QST, are likely the most valuable.
Use CONSORT guidelines to improve study quality and reporting.

2.
Consider measuring other types of QST apart from PPT, e.g. temporal summation.

3.
Measure QST in a variety of locations, e.g. local, regional, remote.

4.
To address the significant between-study heterogeneity, consider choosing commonly used QST locations, standard QST protocols, and commonly used intervention protocols.

5.
Consult the following article and corrigendum [43,44] for appropriate sample size estimations for PPT.

6.
Consider including a sham group, and ensure sham interventions are appropriate and believable, and assess the effectiveness of blinding.

7.
Consider comparing changes in QST against clinically relevant baseline features or treatment outcomes.

8.
Consider assessing psychosocial variables at baseline for use as modifiers/confounding variables, if appropriate based on statistical analysis.