Responsiveness of clinical tests for people with neck pain

Background Responsiveness of a clinical test is highly relevant in order to evaluate the effect of a given intervention. However, the responsiveness of clinical tests for people with neck pain has not been adequately evaluated. The objective of the present study was to examine the responsiveness of four clinical tests which are low cost and easy to perform in a clinical setting, including the craniocervical flexion test, cervical active range of movement, test for the cervical extensors and pressure pain threshold testing. Methods This study is a secondary analysis of data collected in a previously published randomised controlled trial. Participants were randomized to either physical training, exercises and pain education combined or pain education only. Participants were tested on the clinical tests at baseline and at 4-month follow-up. An anchor-based approach using Receiver Operator Characteristics (ROC) curves was used to evaluate responsiveness of the clinical tests. The Neck Disability Index was used to discriminate between those who had improved and those who were unchanged at the 4-month follow-up. Minimum Clinically Important Difference (MCID), together with sensitivity, specificity, positive and negative predictive values, in addition to positive and negative likelihood ratios were calculated. Results In total, 164 participants completed the 4 month follow up. One-hundred forty four participants were classified as unchanged whereas 20 patients were considered to be improved. Twenty-six participants didn’t complete all of the clinical tests, leaving a total of 138 to be included for analyses. Area Under Curve (AUC) ranged from 0.50-0.62 for the clinical tests, and were all below an acceptable level. MCID was generally large, and the corresponding sensitivity and specificity was low with sensitivity ranging from 20 to 60%, and specificity from 54 to 86%. LR+ (0.8-2.07) and LR- (0.7-1.1) showed low diagnostic value for all variables, with PPV ranging from 12.1 to 26.1 and NPV ranging from 84.7 to 89.2. Conclusion Responsiveness of the included clinical tests was generally low when using change in NDI score as the anchor from baseline to the 4-month follow up. Further investigations of responsiveness are warranted, possibly using other anchors, which to a higher degree resemble similar dimensions as the clinical tests.


Background
Musculoskeletal disorders are the most common form of long-term illness, with neck pain disorders rated as the most frequent complaint in Denmark [1]. It is estimated that 70% of the population will experience neck pain during their lifetime [2], and the one-year prevalence has been reported to be approximately 35% [3].
People with a Whiplash Associated Disorder (WAD), as well as those with idiopathic neck pain often develop neck symptoms lasting for more than 3 months [2,4]. People with chronic neck pain often present with a variety of other symptoms with a potential large impact on function and quality of life [5]. Several treatment modalities have been evaluated for neck pain, however, there is little evidence to suggest that one treatment is superior to others [6,7]. The reasons for this may be that 1) the investigated treatment modalities have not been effectively targeted to the right patients, 2) the intervention has not focused on relevant functions, and/ or 3) the clinical outcome measures used are not reliable, valid and responsive to detect a change beyond measurement errors.
Clinical tests have been developed for people with neck pain which target the assessment of neuromuscular control and function, such as strength and endurance of the deep neck flexors and extensors [8][9][10][11]. Additionally, tests for sensorimotor control such as head repositioning, postural control and head-eye coordination are often included in the clinical examination of patients with chronic neck pain [12][13][14]. Moreover, since both primary and secondary hyperalgesia is often evident in people with chronic neck pain, tests for pressure pain sensitivity is used increasingly during the clinical examination [15].
In order for a test to be useful in a clinical setting, it needs to be low cost, safe, easy to use and operational within the time frame of a clinical assessment. Furthermore, it has to be reliable, valid and responsive to detect changes. In a previous study, clinical tests including the craniocervical flexion test (CCFT), cervical active range of motion (ROM), gaze stability (GS), smooth pursuit neck torsion test (SPNTT), test for the cervical extensors (CE), balance tests using sway measurements (SWAY) on a Wii balance board and Pressure Pain Threshold (PPT) tests all showed satisfactory reliability, construct and discriminative validity in people with chronic neck pain and in asymptomatic controls [16]. However, the responsiveness, that is the ability of a test to detect a change, remains unknown for these tests. Yet estimating responsiveness of a clinical test is highly relevant in order to evaluate the effect of a given intervention. Three systematic reviews evaluating clinimetrics of cervical muscle function, ROM and cervical sensorimotor control concluded that the responsiveness of such tests was insufficiently described [17][18][19]. Only PPT tested over the upper trapezius muscle was reported as having acceptable responsiveness when tested in people with neck pain [20]. Therefore, the objective of the present study was to examine the responsiveness of four clinical tests with continuous variables for people with chronic neck pain, which included CCFT, ROM, CE and PPT, since these tests are commonly used in the clinical setting to evaluate the effect of an intervention.
It was hypothesized that the change score of the included clinical tests from baseline to 4-months following an active intervention [21] would correlate with the change in Neck Disability Index (NDI) score over the same time period. It was further hypothesized that all clinical test variables would have an acceptable level of responsiveness.

Study design
The study is a secondary analysis of data collected in a previously reported randomised controlled trial [21]. Participants were recruited between September 2013 and October 2015 from eight different physiotherapy clinics, three hospital units in Jutland and Funen, Denmark, in addition to a municipality on Funen, Denmark. The intervention consisted of 1) physical training, exercises and pain education combined, compared with 2) pain education only, and the protocol has previously been described in detail [22]. The clinical testing procedure followed a standardized protocol [16], and was performed by two experienced musculoskeletal physiotherapists (IR and RJ). Participants were tested at baseline and at the 4-month follow-up. The 4-month follow-up was chosen since it was expected that the intervention of specific exercises would have a measurable effect within this time frame [23]. Prior to the study, examiners were trained to ensure standardization of test procedures.

Study population
Patients were included according to the following criteria: 18 years or older receiving physiotherapy treatment or having been referred for physiotherapy treatment for chronic neck pain with at least 6 months duration, and reduced physical neck function (NDI-score of at least 10/50) [24], pain primarily in the neck region, and the ability to read and understand Danish.
Patients with neuropathies/ radiculopathies (defined by positive Spurling, cervical traction and plexus brachialis tests) [25], being in an unstable social and/or working situation, pregnancy, known fractures, and depression according to the Beck Depression Inventory (score > 29) [26] were excluded.
Subjects received oral and written information about the project and gave their written informed consent to participate. The Regional Scientific Ethical Committee of Southern Denmark approved the study (S-20100069), and the study conformed to The Declaration of Helsinki 2008.

Self-reported outcome measures
Demographic data was collected including: age, gender, height and weight and type of accident (if any). Change from baseline to 4 months follow-up on the NDI was used as the criterion measure of clinically important change with higher change scores representing greater recovery. The NDI is a measurement tool covering activities of daily living in people with neck pain. It consists of 10 items with 6 ordinal response categories ranging from 0 to 5, giving a maximum score of 50, with higher scores representing greater disability. The reliability and validity of this measure are acceptable [24,27,28]. The reason for selecting NDI as the anchor for responsiveness of the clinical neck tests was to mirror a self-reported variable closely related to neck function.

Clinical tests
The tests have been described in detail elsewhere and and have shown acceptable reliability, construct-and discriminative validity [16,21]. Therefore, they will only be presented briefly.
Craniocervical Flexion Test (CCFT) was performed, using a Pressure Biofeedback Unit (Stabilizer; Chattanooga Group, USA), as described by Jull et al. [11]. The subject was asked to perform craniocervical flexion in five incremental stages guided by the pressure sensor. The activation score has six scoring options; 20, 22, 24, 26, 28 and 30 mmHg.
Cervical Extensor (CE) test was performed in prone with the head and neck over the edge of the bed. A laser was fixed on top of the subject's head and was projected to a target. The duration of time the laser beam was kept within the center of the target was measured in seconds (s), as a measure of cervical extensor muscle endurance.
Range of movement (ROM) was examined using a bubble inclinometer (Baseline Bubble Inclinometer, Fabrication Enterprises Inc., USA) for flexion/extension and lateral flexion, and with a custom-made equipment for neck rotation. All movements were registered to the nearest degree, except for rotation, which was registered to the nearest 5 degrees.
Pressure Pain Threshold (PPT) was examined bilaterally at three sites (cervical spine C5/C6 segment, m. infraspinatus and M. tibialis anterior, the latter being a reference value) using a hand-held algometer (Wagner, FPX algometer, USA), and measured in kilogram-force (Kgf ). Only data from the cervical spine and M. tibialis anterior sites are presented in the current study.

Statistical analysis
An anchor-based approach using Receiver Operator Characteristics (ROC) curves was performed to examine responsiveness of the clinical tests. ROC-curves were used to evaluate the ability of the clinical tests to detect change in NDI from baseline to the 4-month follow-up. NDI was used to discriminate between those who had improved and those who were unchanged or worsened at the 4-month follow-up. Several studies have reported minimal clinically important difference (MCID) estimates of the NDI for different study populations, ranging from 3.5 to 7.5 for mechanical and non-specific neck pain [29][30][31][32]. Based on a systematic review [27], a cutoff greater than 7 change points on the NDI was chosen for the classification of improved, while not-improved were all change scores of 7 or less. Between group differences (improved vs. not-improved) were compared at baseline using independent t-tests for parametric data and Mann-Whitney's U-test for nonparametric data.
Change scores from the clinical tests were correlated with the change scores of the NDI using Pearsons's correlation. An acceptable correlation coefficient of 0.30-0.35 was used, as previously recommended for questionnaires when defining the acceptable correlation between anchor and test [33].
Change score from baseline to the 4-month follow-up for each of the clinical tests is the independent variable, and the corresponding ROC curve, plots sensitivity values (true positive) against the 1-specificity values (false positives) with change in NDI of above/below 7 as the dependent variable. Area under the curve (AUC) (95% CI) was used as an indicator of responsiveness, assessing the test's ability to distinguish patient groups, based on the NDI change scores above/below 7. AUC at 0.50 is considered as no discriminative ability beyond chance, whereas AUC = 1 represents the ability to correctly discriminate all patients. An area 0.7 -0.8 is considered acceptable, and 0.8 -0.9 is considered excellent [34].
The MCID was determined as the score, offering the best discrimination between the improved and unchanged/worsened group (greatest sensitivity and specificity). Using the ROC curve the uppermost left-hand corner of the curve represents the optimum condition where both sensitivity and 1-specificity are maximized. Positive and negative predictive values, which represents the chance of finding a true positive change, given a positive test result and a true negative change, given a negative test result [35], in addition to positive and negative likelihood ratios, which is the probability of a positive respective negative test result for a person who has changed divided by the possibility of a positive respective negative test result for a person who hasn't changed. For all statistical analyses, the STATA statistical package was used (Stata Corp., 2000, Stata Statistical Software: Release 14, College Station, TX, USA).

Results
A total of 200 patients were included in the original RCT-study [21]. Of these, 164 completed the 4-month follow up and were eligible for the present study. At the 4-month follow-up and according to the described groups classified by their NDI score, a total of 144 (86%) were classified as unchanged, and 20 (14%) as having improved. In the unchanged group, although they completed the 4-month follow up, 26 participants didn't complete the clinical tests and therefore 118 were included in the analysis.
The two groups did not differ in their baseline demographic characteristics (Table 1), except for duration of symptoms, in which, on average the improved group had symptoms for a significantly longer time compared to those in the unchanged group. No significant differences in mean change score between unchanged and improved groups were found for any of the clinical tests (Table 2). ROM in neck extension was close to statistical significance (6.34 s (−0.29 to 12.96), p = 0.06).
Correlations between the NDI and the clinical tests were estimated using Pearson's (r) and ranged from 0.09-0.21, and were below the acceptable level of at least 0.3. Significant correlations were found for ROM in extension and lateral flexion to the right and PPT at C5 left. AUC ranged from 0.50-0.62, (just above discriminate ability beyond chance), and were all below the recommended acceptable level of at least 0.7 (Table 3).
MCID was generally large, and the corresponding sensitivity and specificity were low with sensitivity measures ranging from 20 to 60% (highest for ROM), while specificity ranged from 54 to 86% (highest for CCFT and PPT) (Table 4).

Discussion
Responsiveness of the clinical tests evaluated in this study was generally poor when using NDI as an anchor of at least 7 change points for improvement from baseline to the 4-month follow-up in people with chronic neck pain. AUC was low for all variables, likewise all variables (CCFT, ROM, CE and PPT) demonstrated non-satisfactory correlations with NDI, and the MCID was large.
To the best of our knowledge this is the first study to assess responsiveness of clinical tests for people with neck pain since only PPT variables have been evaluated previously in non-chronic neck pain patients [20]. The previous study demonstrated satisfactory responsiveness for PPT measured over the upper trapezius (AUC 0.76; 95% CI: 0.57;0,89) but not for PPT measured over the tibialis anterior (AUC = 0.65; 95% CI:0.46;0.84) [20] which is in contrast to the current findings. There could be several reasons for these contrasting results. Firstly, the study population differs between the two studies, as Walton and colleagues [20] included people with acute or chronic neck pain as opposed to the current study  which included only people with chronic pain, and it is likely that differences in severity, symptoms and pain mechanisms affect responsiveness of PPT. Secondly, although both studies used an anchor based method to measure responsiveness, the anchor differed between studies. Walton et al. used Global Perceived Effectiveness (GPE) in contrast to the current which considered the NDI [20]. Choosing GPE as an anchor for real change as used in some previous studies [29,31,37], may be biased due to the subjectiveness of GPE and the questioned reliability and validity of this measure [33,38]. In the current study with a 4-month follow-up, recall-bias may have been present if GPE was selected, since previous studies have shown GPE to have higher correlation with present than initial status [38,39]. Moreover, GPE is a generic health related outcome, as opposed to the specific tests evaluated in this study, which is why the NDI, with higher emphasis on self-reported neck function, was selected. However, the correlations between the anchor and the clinical tests were all below the previously set level of acceptance (0.3), indicating that the current clinical tests are not sufficiently covered by changes on the NDI. Since NDI has been critizised for poor sensitivity to longitudinal changes [40] it seems questionable whether large changes on the NDI reflect a longitudinal change in the current study.
The choice of the cut off (at least 7 change points on the NDI) is important for determining the responsiveness. The current cut-off was selected based on the MCID calculated on the NDI in the previously reported systematic review [27]. Choosing another cut-off, for instance a change point of 3 on the NDI, as in some previous studies [31,32], and/or different cut-offs for the different tests, may have resulted in different estimates. However, post-hoc analysis with a cut-off of 3 change points did not change estimates considerably.
The current MCID variables were all lower than previously reported Minimum Detectable Change (MDC) [16], except for CCFT, meaning that the current calculated MCID could be attributed to measurement error. The current MCID is based on dichotomization of patients as improved and not improved, and does not take into account whether the clinical status on other areas has actually changed. However, PPV and both Likelihood ratios were all below the acceptable levels.
Limitations of this study are the relatively small sample for the improved group (although the total group was large) and the difficulties in identifying the appropriate  anchor, that measures the same dimensions as the clinical tests. Using NDI doesn't seem to be an optimal anchor due to the small group of responders. Since pain is one of the main complaints in this patient group, a measure of pain intensity (eg. Visual Analogue Scale) could be suitable for classifying patients as improved or worsened. The appropriateness of alternative new neck instruments as an anchor remains to be studied in the future. In addition, the clinical tests used may only be classified as semi-objective.
A strength of this present study is that it followed a strict and standardised protocol [21] with a detailed description, and training in the clinical tests and their interpretation. In addition, the study was performed in a clinical setting using simple and low cost clinical tests, previously shown to have satisfactory reliability [16], aiming for high generalizability.

Conclusion
In conclusion, responsiveness of the included clinical tests (CCFT, ROM, CE and PPT) was generally low when using NDI change score greater than 7 as the anchor point from baseline to a 4-month follow up. A major limitation is the use of NDI as an anchor and further investigations of responsiveness are warranted, possibly using other anchors, for instance pain measures which to a higher degree resemble similar dimensions as the current clinical tests.