The mechanical and inflammatory low back pain (MIL) index: development and validation

Background The purpose of this study was the development of a valid and reliable “Mechanical and Inflammatory Low Back Pain Index” (MIL) for assessment of non-specific low back pain (NSLBP). This 7-item tool assists practitioners in determining whether symptoms are predominantly mechanical or inflammatory. Methods Participants (n = 170, 96 females, age = 38 ± 14 years-old) with NSLP were referred to two Spanish physiotherapy clinics and completed the MIL and the following measures: the Roland Morris Questionnaire (RMQ), SF-12 and “Backache Index” (BAI) physical assessment test. For test-retest reliability, 37 consecutive patients were assessed at baseline and three days later during a non-treatment period. Face and content validity, practical characteristics, factor analysis, internal consistency, discriminant validity and convergent validity were assessed from the full sample. Results A total of 27 potential items that had been identified for inclusion were subsequently reduced to 11 by an expert panel. Four items were then removed due to cross-loading under confirmatory factor analysis where a two-factor model yielded a good fit to the data (χ2 = 14.80, df = 13, p = 0.37, CFI = 0.98, and RMSEA = 0.029). The internal consistency was moderate (α = 0.68 for MLBP; 0.72 for ILBP), test-retest reliability high (ICC = 0.91; 95%CI = 0.88-0.93) and discriminant validity good for either MLBP (AUC = 0.74) and ILBP (AUC = 0.92). Convergent validity was demonstrated through similar but weak correlations between the ILBP and both the RMQ and BAI (r = 0.34, p < 0.001) and the MLBP and BAI (r = 0.38, p < 0.001). Conclusions The MIL is a valid and reliable clinical tool for patients with NSLBP that discriminates between mechanical and inflammatory LBP.


Background
Low back pain (LBP) is a source of considerable financial and societal costs [1]. Its natural course is argued as either self-limiting, where 3-10% become chronic [2], or recurrent [3] and unfavorable [4], where up to 62% still experience pain after one year [5]. In most cases a specific diagnosis for LBP cannot be defined on the basis of anatomical or physiological abnormalities alone [6]. A subgroup classification approach in RCTs that matches patients with non-specific low back pain (NSLBP) to the treatment they receive, has demonstrated better outcomes than a homogenous classification approach [7]. Consequently, it would seem likely that patients with NSLBP represent a heterogeneous collection of conditions and that subgroup identification with tailored therapies may improve clinical outcomes [8,9]. However, attempts to achieve this through the use of an anatomical or physiological basis have not been demonstrated as being significantly more effective than other approaches [7]. It is crucial to identify subgroups within the broad NSLBP classification on the basis of physical signs and symptoms [10].
Over the last decade there has been a tendency in manual therapy subgroups to conceptualize and manage NSLBP as "mechanical" and/or "inflammatory" [11,12]. Although these labels do not have universally accepted definitions, there is evidence to support both mechanical and inflammatory factors as being involved in the generation of NSLBP [13][14][15][16]. There are two distinct notionally contrasted approaches that follow this logical separation: "predominant mechanical" treatments such as exercise [6], traction, mobilization and manipulation [9]; and "predominant anti-inflammatory" treatments such as electromodality approaches [17], non-steroidal antiinflammatory medications and corticosteroid injections [18]. However, exercise also has an anti-inflammatory effect as evidence indicates a protection against chronic diseases with low-grade inflammation such as diabetes and cardiovascular conditions [19].
In the presence of identifiable anatomical or physiological abnormalities, specific therapies or interventions can be utilized. However with NSLBP only, an empirical approach can be employed [20]. Although some reviews of NSLBP treatments have shown the benefits of physical and pharmacological interventions, these studies concede that the effect sizes are often small and the differences are minimal when additional therapy interventions are included [6,21,22]. This apparent lack of effect may be due in part to the classification of NSLBP as a homogenous condition rather than a heterogeneous collection of undefined but differing conditions, some of which may respond to specific therapeutic interventions [8]. An example of this approach is where patients diagnosed with NSLBP may be identified as either mechanical (MLBP) or inflammatory (ILBP) [23]. It would therefore seem advantageous to attempt to divide LBP sufferers into these groups and that they may respond more readily to separate treatment approaches.
The a-priori hypothesis of this study was that a new tool with two dimensions could be developed in order to distinguish between LBP of a Mechanical (MLBP) and inflammatory (ILBP) source. The specific objectives of this study were three-fold: (1) to propose a two-factor model representing MLBP and ILBP levels by exploratory factor analysis (EFA); (2) to ratify this model with confirmatory factor analysis (CFA); and (3) to utilize the CFA results in order to construct and validate summative scales of the standardized values of the index that facilitate assessment of MLBP and/or ILBP.

Design
A two-phase prospective, observational study was conducted involving the development, and subsequent validation of a Mechanical and Inflammatory low back pain (MIL) index.

Phase 1: Mechanical and Inflammatory LBP (MIL) Index development
A total of 27 items indicating signs and symptoms of potential mechanical and inflammatory NSLBP were extracted from the Walker and Williamson study [23] and assembled in a usable, testable format Additional file 1. A panel with five experts was formed as a part of the content validity assessment and included a sports physician, rheumatologist, general practice physician and two physiotherapists. Each panel member was experienced in treating back pain, had worked in both the clinical and research environments and presented their opinions as a representation of their field of expertise and qualification.
This panel identified areas of omission and item improvement or modification through a consensus approach using the content validity guidelines of a minimum of four votes with an average score of 3 on a four-point ordinal scale. This enabled a diverse and balanced approach that minimized medical or health management bias. This procedure yielded an initial MIL Index with 11 item items.

Content validity
A four-point ordinal rating scale was used to rate each of the 11 items: "1" = not relevant, "2" = unable to assess relevance without item revision, "3" = relevant but needs minor alteration, "4" = very relevant and succinct. The item evaluation content validity index [24] calculations were applied to both the items and the entire instrument with an a-priori requirement of 3 points with four panel votes.

Face validity
A 5-point numerical rating scale was used (0 = not easy, 4 = very easy) to evaluate item accuracy, comprehensiveness and ease of response with an a-priori requirement of 3 points.

Phase 2: Mechanical and inflammatory LBP index (MIL) validation Design
A prospective observational study investigated the responses of participants (n = 170) recruited for the study. Three instruments and one physical test were administered: the Roland-Morris Questionnaire (RMQ), the Shortform Health Status survey (SF-12) and the newly created MIL. The "Backache Index" (BAI) was used as the physical test. The evaluators were two physiotherapists with more than 2-years of professional experience. For test-retest reliability two separate test periods were used on a subgroup of participants (n = 37) with a three-day interval. On each test occasion the second assessment assessor was blinded to the original scores to ensure independent data collection.

Patients and setting
The participants (n = 170, 38 ± 14 years-old, n = 96 females) were diagnosed with NSLBP using Waddell's classification for acute and chronic conditions [20] by a general practitioner (GP), and then were referred to two Spanish physiotherapy outpatient clinics. Exclusion criteria were refusal to participate in the study, LBP as a result of a specific spinal disease, infection, presence of a tumor, osteoporosis, fracture, structural deformity, inflammatory disorder, radicular symptoms or cauda equina syndrome. The study was authorized by the Ethics and Research Committee of the Faculty of Medicine at Malaga University. All participants gave written informed consent, confidentiality and anonymity were preserved at all times, and the principles of the "Declaration of Helsinki" and its subsequent updates were respected.
The standardized measures administered in the study are described below: was the 11-item draft. The items used in each sub-section are 1) Mechanical -pain on trunk flexion, pain on lateral bending and palpation pain (spinous process); 2) Inflammatory -intermittent pain during the day, morning pain on waking and initial getting up, stiffness after resting and pain on repetitive bending. Scoring is performed by use of the standardized scores with regression methods determined from factor analysis.

Physical tests used in the study
The "Backache Index" (BAI) [29] determines the physical status from a single test of 5 simple trunk movements of a patient standing still in erect position: (1) flexion (with knee flexion limited to 10 degrees), (2) bilateral sideflexion to the left and (3) to the right, and (4) bilateral combined extension and lateral flexion to the left and (5) to the right. Observer assessment is performed by means of scoring pain factors obtained by asking the patient, and stiffness estimation at the end of the 5 trunk motions assessed by a physiotherapist according to the BAI criteria [29]. The results are recorded with a four-point score per outcome (0-3 points) and the sum of the five outcomes yields the BAI with a maximum of 15 points. Reliability coefficients of the Spanish version of BAI were excellent (n = 42; ICC = 0.97 at three-day follow-up) [30].

Statistical analyses
The LISREL v.8.0 and Statistical Package for the Social Sciences (SPSS) v.17.0 were used to compute the statistical analyses. The factor structure, internal consistency, and construct validity were assessed from the full sample. The test-retest reliability was assessed through the Intra-class Correlation Coefficients (ICC) Type 2, 1, and expressed with 95%CI using scores on the MIL from participants at baseline and three days later during a non-treatment period. Participants rating on an 11-point numerical rating scale (NRS) of perceived overall status at baseline and on day three provided the reference criterion to determine change. The subsample of participants (n = 37) for test-retest reliability was determined from the calculations of power analysis from the sample size attributes [31]. The participants were initially randomized into two equal groups for the purpose of cross-sample validation, allowing for exploratory factor analysis (Maximum Likelihood using Oblimin rotation and Kaiser's normalization) with one half and confirmatory factor analysis with the other.
The "Root Mean Square Error of Approximation" (RMSEA), the "Comparative Fit Index" (CFI), and the "Normed Fit Index" (NFI) are used to evaluate the model fit. For the RMSEA, ≤0.08 reflects a reasonable fit [32]. The NFI and CFI varied along a continuum of 0 to 1 with ≥0.90 being satisfactory [33]. Since components/ factors of signs and symptoms of LBP are continuous variables and factor loadings obtained by CFA cannot be used directly to assess the MLBP ILBP factors, a MLBP and ILBP index was developed. This is calculated as the sum of the standardized scores with regression methods of the two factors that comprise our proposed model.
In order to know whether the MIL instrument measures relatively specific constructs, the corrected item-total correlations were examined. Then, the internal consistency of the dimensions was determined by means of Cronbach's α. Test-retest reliability was performed at three days during a period of no treatment [34]. Correlating the BAI, SF-12, RMQ and MIL measures assessed convergent validity. Discriminant validity was determined examining the receiver operating curves (ROC) area under the curve (AUC) values [35].

Sample size
The minimum sample sizes for the validation study were verified from the results as determined from an 80% chance of detecting goodness of fit with an Effect size w = 0.5, alpha = 0.05, beta = 0.08, allowing for 15% attrition. This gave convergent validity (n = 61), test-retest reliability (n = 36), discriminant validity (n = 52) and the pooled samples for internal consistency and factor analysis (n > 100) [31].

Practical characteristics
Readability was assessed using the Flesch-Kincaid grading scale, a recognised measurement standard that is obtained within the grammar section of most standard word-processing software [36]. Missing responses were determined from all participant responses. Completion and scoring times were determined respectively from participants and clinicians from the average of three separate scores.

Content validity
The 27 signs and symptoms items were reduced to an initial set of 11 through panel feedback and consensus agreement as detailed ( Table 1). The reduction to the final set of seven items was achieved through factor analysis where four items were removed to leave the final MIL 7-item version. Two content validity index calculations were performed on both the items and the complete questionnaire to determine whether an item would be removed due to cross-loading (the presence of an item in both dimensions where loading is > 0.40).

Face validity
All panel members agreed on the MIL being suitably indicative of a questionnaire to determine the presence of mechanical or inflammatory symptoms. All participants were able to complete the MIL without missing responses or additional assistance.

Phase 2: MIL validation Psychometric characteristics
Factor analyses Four items presented at >0.40 in both dimensions and these items were removed for crossloading: "Pain when standing for a while"; "Pain on trunk extension"; "Palpatory pain of muscles"; and "Pain getting out of a chair". A flow chart of how the final MIL version was constructed and reduced from the initial 27-items to 7-items is presented (Figure 1).
The Kaiser-Meyer-Olkin measure produced a coefficient of 0.68, indicative of sampling adequacy, and the Bartlett's Test of Sphericity reached statistical significance (p < 0.001). Both supporting the factorability of the correlation matrix. There were 'two factors' prior to the 'inflection' point in the scree test with Eigenvalues >1.0, itemvariance >5% [31], and a total cumulative variance of 51.7%. The rotated 'two-factor' solution showed strong loadings ( Table 2).
The CFA of the two-factor model yielded a nonsignificant χ 2 -test (χ2 = 14.80, df = 13, p = 0.37). The other fit indices were very satisfactory (NFI = 0.97, CFI = 0.98, and RMSEA = 0.029) ( Figure 2) and the factor loadings of all variables were >0.40. The correlation coefficient between the two dimensions of 0.56 suggests a moderate relationship.
Correlations between item-total factor Kendall´s Tau are shown in Table 3. The items "morning pain on waking" and "pain on repetitive bending", both correlate highly with the ILBP component of the MIL questionnaire; while "pain on trunk flexion" and "pain on lateral bending" are factors more related to the MLBP component (Table 3).  Cronbach's α for the MLBP and ILBP factors was modest, being respectively at 0.68 and 0.72. The development of a combined index is justified given that the two factors are significantly and moderately associated. The MIL index is a pragmatic sum of the standardized scores with regression analysis of the two factors.

Baseline responses and test-retest reliability
Baseline responses demonstrated normalized distribution for the 7-items. Normality was determined and means and variability of all measures are represented ( Table 4) Figure 2 The pathways, factor loading and goodness-of-fit indexes of the two-factor structure underlying the MIL.

Convergent validity
The correlations between the factor ILBP, and RMQ and BAI measurements were practically identical but weak (r = 0.34, p < 0.001). The instruments that correlated weakly with the MLBP were the PCS, RMQ and BAI (r = 0.38, p < 0.001). Taking the factors ILBP and MLBP together, a significant but weak correlation is seen with the BAI and the RMQ, but virtually non-existent with the SF-12 PCS and SF-12 MCS (Table 5) apart from a very weak correlation with the PCS value and the combined MIL score.

Discriminant validity
The ROC analyses indicated that the AUCs (expressed in 95% confidence interval) for the specific low back pain questionnaires were from 0.74-0.92 for the RMQ and 0.51-0.65 for the BAI. In general, no significances were noted with the exception of the ILBP and the ILBP plus MLBP factors in the case of the RMQ value of state variable at 20%.

Practical characteristics
Readability was acceptable with a Flesch-Kincaid grade level at 6.8 and 68.5% reading ease.
Missing responses were acceptable with four responses found in three questions (1, 2, and 4) at a frequency of 5%. Completion time was 6.57 ± 3.03 minutes.

Discussion
The findings of this study indicated that the MIL had high reliability and the ability to adequately discriminate patients into two subgroups of MLBP and/or ILBP.
The MLBP characteristics were 'Pain trunk flexion', 'Pain lateral bending' and 'Palpation pain of vertebrae'. The ILBP characteristics were 'Morning pain on walking and initial getting up', 'Pain repetitive bending', 'Intermittent pain during day' and 'Stiffness after resting'.
Provocative symptoms from MLBP elicited by lateral bending may stem from either inflammation of thoracolumbar spine articulations, such as disco-vertebral and facet joints, and/or from muscle strain. For ILBP, initiating movements may stress inflamed and swollen soft tissues as well as the local lumbar and sacro-iliac joints, even if no radiological anatomic spine or pelvic abnormalities are evident [37].
Walker and Williamson [22] in their study of NSLBP patients found morning pain on activity suggested high levels of agreement as an indicator of ILBP, while pain when lifting suggested rather MLBP. In this study the ILBP corresponded to "morning pain on waking", while for MLBP the two elements of trunk "pain on lateral bending" and "flexion" corresponded with BAI. Consequently the combination of these two aspects of mechanical and inflammatory indicators in the MIL index should be able distinguish between ILBP and MLBP and confirm the approach of Walker et al. [22]. This supports the   concept of differentiation of NSLBP into these two subgroups.
No strong correlation was found between ILBP and the SF-12 factors (PCS and MCS measures), only a weak correlation with the combined MIL components and that of the PCS score. This confirms the findings of a previous study of Moix et al. [38], where very weak associations were found between chronic LBP and mental health status [38]. The predictive ability of the MIL questionnaire for functional disability was moderate to high.
A pilot study of Riskman et al. [39], that employed the mechanical and inflammatory LBP analogue instrument was unable to effectively categorize the majority of patients into ILBP or MLBP. The MIL by contrast has employed a method that appears more effective at discriminating between these aspects. This may help the clinical decision process regarding the type of loading treatment (pharmacological or mechanical) that would be more effective for patients when the symptom profile is taken into account. This should increase the adequacy of treatment interventions provided to patients.

Weaknesses and strengths
It is acknowledged that the difference between acute and chronic NSLBP is probably responsible for the weak responsiveness results in the convergent validity. The mix of patients with acute NSLBP represents a bias towards patients with flexion problems while chronic NSLBP represents a bias towards general stiffness [40]. These factors may have increased the variability of the results. The selection of symptomatic items was developed based on the opinions of the panelists and not assessed through an experimental investigation. The strength of this study is that it supports the reliability of the new MIL questionnaire system and the ability to distinguish between ILBP from MLBP subgroups of NSLBP patients.

Implications and future directions
Our results suggest that the MIL can pragmatically distinguish NSLBP into subgroups of mechanical and inflammatory symptoms. This is achieved through a continuous index based on the components of a twofactor model obtained through CFA. The MIL should be able to offer a standard clinical frame of reference. Furthermore, in order to help clinicians obtain immediate results based on raw patient data, we have developed a software application to provide the index values (see http://www.salud.uma.es/calculaMIL/).
Our study may lead to improvements in the understanding and assessment of mechanical and inflammatory NSLBP. It confirms a two-factor model underlies NSLBP and that clinicians can use a simple index to distinguish between these two subgroups. Further research is needed to determine the generalizability and crosscultural validity of the MIL. It has potential utility in patient assessment and treatment evaluation as well as the ability to provide clinicians with a quick assessment to distinguish between mechanical and inflammatory NSLBP components. Such research may assist in the demonstration of the value of this new MIL procedure in the clinical setting.

Conclusions
The findings of this study suggest the MIL, in this initial stage of research, is a valid and reliable for distinguishing between mechanical and inflammatory LBP. While earlier similar studies could not retrieve the difference between mechanical versus inflammatory LBP, the new elaborated MIL scale gives clinicians the opportunity to decide in which direction treatment options should be considered. The main shortcoming in this study in that both acute and chronic NSLBP patients' were included. Consequently, further studies are needed to assess the generalizability and cross-cultural validity of our findings.