Introduction

The Oswestry Disability Index (ODI) [1] and the Neck Disability Index (NDI) [2] are commonly used condition-specific instruments for assessment of pain and disability in patients with spinal disorders. Alternative methods for calculating ODI/NDI scores have been reported [1, 3]. For example, the handling of missing values may differ and some questions may be excluded (e.g., ODI item 8 about sex life or NDI item 8 about driving). If an instrument is modified, e.g., by omitting items or by changing scoring algorithms, it is important to evaluate the consequences of these changes in relation to the original version of the instrument. It is also essential to thoroughly describe the changes, and the consequences of these changes, when reporting the results of scientific studies based on the modified instrument. This is important as these data may be used as a decision base for advocating specific treatments.

The ODI has been used in Swespine for assessment of disability since 2003 and the NDI since 2006. In case of missing items, the ODI and NDI scoring algorithms should be adjusted according to the formula: (total score) × 100/(5 × number of questions answered). The adjustment has the same effect as mean imputation of the missing items. In Swespine, the scoring adjustment was not used between 2006 and April 2022, i.e., ODI and NDI was always calculated as (total score) × 100/50. The primary aim of the current study was to evaluate the practical implications of the unadjusted ODI and NDI scoring algorithms. As there is no recommendation concerning the number of missing items accepted, the secondary aim was to also compare the ODI and NDI score changes when adjusting for at most 1 or 2 missing items, i.e., 10 or 20% missing items.

Patients and methods

Study design

The present study was register-based, using prospectively collected longitudinal data from Swespine, the national Swedish spine register.

The Oswestry disability index (ODI)

The ODI is a single-dimensional, 10-item, self-administered instrument for the assessment of disability (pain, personal care, ability to walk, etc.) [1]. Each item has 6 possible answers coded on the ordinal scale 0 to 5 (0 being the best and 5 the worst). The final score is summarized as: (total score) × 100/(5 × number of items answered) [1]. There is currently no recommendation on the acceptable number of missing items.

The neck disability index (NDI)

The NDI is a single-dimensional, 10-item, self-administered instrument for the assessment of disability (pain, personal care, lifting, etc.) [2]. Each item has 6 possible answers coded on the ordinal scale 0 to 5 (0 being the best and 5 the worst). Results are reported as a score out of 50 or a percentage out of 100 [3]. Swespine reports NDI as a percentage out of 100. In case of missing items, Vernon [3] recommends mean imputation of at most 2 missing items.

The Swespine ODI and NDI scoring algorithms prior to April 2022

In Swespine, the ODI and NDI results have been reported as (total score) × 100/50, i.e., no adjustment for missing items. For ODI, there was a minor inconsistency in the handling of missing values. No ODI score was to be calculated if more than 2 items were missing, but in 0.15% of the cases, more than 2 missing items were allowed. For NDI, any number of missing items was allowed.

Patient data set

Patients, who were surgically treated for lumbar degenerative conditions, i.e., central spinal stenosis with or without degenerative spondylolisthesis (DS), lateral spinal stenosis, degenerative disk disease (DDD) or disk herniation between 2003 and 2019 (17 years), and cervical degenerative conditions, i.e., radiculopathy from disk herniation or foraminal stenosis and myelopathy from disk herniation or central stenosis between 2006 and 2019 (14 years), were identified in the Swedish spine register. Preoperative and 1-year postoperative patient reported ODI and NDI data were used. Patients with missing preoperative or 1-year postoperative ODI/NDI were excluded from the analysis.

Primary analysis

To compare differences in ODI/NDI total score between the unadjusted Swespine scoring algorithms with the original adjusted scoring algorithms accepting 2 missing items (referred to as scoring algorithms O2 and N2).

Secondary analysis

To compare differences in ODI/NDI total score between the original adjusted scoring algorithms accepting 1 missing item (algorithms O1 and N1) with the original adjusted scoring algorithms accepting 2 missing items (algorithms O2 and N2).

Simulated data set with small patient populations

To evaluate how the distributions of missing ODI and NDI values in Swespine affect smaller data sets, we computer-generated 2 samples of 50 randomly selected patients from Swespine for each diagnosis (5 lumbar and 4 cervical) and then analyzed ODI and NDI outcomes using the Swespine and the O1/O2 and N1/N2 algorithms (R Foundation for Statistical Computing, Vienna, Austria, 2017), in all 18 analyzed samples.

Example studies

To illustrate how previously published ODI data might be affected by different ODI scoring algorithms, we recalculated ODI using the O2 algorithm (instead of the Swespine algorithm) for 2 previously published studies [4, 5] of the first author of the current study.

Statistics

Data are presented as mean and standard deviation (SD) and/or 95% confidence intervals (CIs). Bootstrapping was used to calculate CIs [6]. Standardized response means (SRMs) for paired data, i.e., the difference in means divided by the standard deviation of the difference, were used to evaluate effect sizes [7]. The SRMs were interpreted as follows: < 0.2 no effect, 0.2 to 0.4 small effect, 0.5 to 0.7 moderate effect and  > 0.7 large effect [7].

Results

Baseline patient characteristics

The demographic characteristics of the study population are shown in Tables 1 and 2.

Table 1 Demographic characteristics of the ODI study population
Table 2 Demographic characteristics of the NDI study population

Missing items

The distributions of the missing items in the ODI/NDI are shown in Tables 3 and 4. More than 1 missing item was uncommon (less than 5%). The most common missing ODI item was item 8 regarding sex life (Supplementary Table S1A). The most common missing NDI item was item 8 about driving (Supplementary Table S1B). The preoperative and 1-year postoperative ODI and NDI scores are shown in Tables 5 and 6 and Supplementary Fig. S5A and S5B.

Table 3 Number of missing ODI items in Swespine for degenerative conditions of the lumbar spine
Table 4 Number of missing NDI items in Swespine for degenerative conditions of the cervical spine
Table 5 ODI preoperatively and year 1 postoperatively in Swespine for patients surgically treated for degenerative conditions of the lumbar spine
Table 6 NDI preoperatively and year 1 postoperatively in Swespine for patients surgically treated for degenerative conditions of the cervical spine

Differences in score change

The preoperative as well as the 1-year postoperative ODI and NDI scores were approximately 1 unit smaller for the Swespine scoring algorithm than for the O1/O2 and N1/N2 scoring algorithms (Tables 5 and 6). The differences between preoperative and postoperative ODI/NDI scores were similar between the Swespine and O1/O2 and N1/N2 scoring algorithms. The 95% CIs overlapped, indicating no statistically significant differences between the preoperative–postoperative differences. The effect sizes (SRM) were similar to overlapping 95% CIs indicating no statistically significant differences (Tables 5 and 6). The results were also similar to overlapping 95% CIs whether 1 or 2 missing items were accepted in the ODI/NDI scores (Tables 5 and 6). The results of the simulation study with 50 randomly selected patients in each group are reported in Supplementary Figs. S1 to S4. Since the CIs were wider for the smaller data sets, we found no statistically significant differences between the different scoring algorithms in any outcome variable.

Example studies

The ODI scores were approximately 1 unit smaller for the Swespine scoring algorithm compared with the O2 scoring algorithm. This 1-unit difference did not affect the conclusions of the studies (Table S2, and Fig. S6A and S6B).

Discussion

In the present study, we used a large real-life data set, to evaluate different scoring algorithms for the ODI and NDI. The differences in the algorithms concern the handling of missing items. The main finding was that the Swespine algorithms underestimated the ODI and NDI scores by approximately 1 unit out of 100 compared to the original algorithms. There were no statistically significant differences in the scores or effect sizes when using the different algorithms.

For the ODI, the most frequently missing item was item 8, about sex life (Supplementary Table S1A). This confirms the findings of the validation studies for several different languages that ODI item 8 is the most frequently missing item and that more than 1 missing item is an uncommon finding [8,9,10,11,12,13,14]. The frequencies of responses with no missing items for the 5 groups of diagnoses were 75–90% (Table 3). The corresponding values reported in previous validation studies were 60–85% [8,9,10,11,12,13,14]. The number of missing items differed for different diagnoses. Missing items were most frequent for lumbar spinal stenosis (ODI) and cervical central spinal stenosis (NDI) (Tables 3 and 4). However, fewer than 5% of the responses for all diagnoses had more than 1 missing item. In summary, the distribution and frequency of missing ODI items were similar to those reported in previous studies. Concerning NDI, previous reports on missing items are inconsistent. Some authors have reported a high rate of missing answers to item 8, about driving [15, 16] while others have reported only negligible numbers of missing items [17]. In our case, item 8 about driving was the most common missing item.

Kent and Lauridsen [18] studied effects of missing items on the Roland Morris Disability Questionnaire (RMDQ) and compared with the ODI. The frequency of missing ODI items was found to be 13.9% (59 of 424). For RMDQ, the authors recommended using an adjustment algorithm similar to the ODI adjustment model and not simply ignoring missing items.

Few studies have specified in detail how the missing items in the ODI are handled. Fairbank and Pynsent [1] exemplify how 1 missing item is handled. The Danish validation study by Comins et al. [19], based on the Danish spine register (DaneSpine), excluded all patients with more than 1 missing item in the ODI. Given the low frequency of 2 or more missing items in our study (Table 3), the conservative Danish approach will result in only minor reductions in sample sizes compared to the case when 2 missing items are allowed, as demonstrated in our current comparison (Table 5). For NDI, Vernon [3] state that if 3 or more items are missing, the score may be invalid.

We found no statistically significant differences between the Swespine and the O1/O2 and N1/N2 algorithms for the difference between year-1 and preoperative data (Tables 5 and 6). In addition, we found no statistically significant differences between the Swespine and the O1/O2 and N1/N2 algorithms for the SRM (Tables 5 and 6). This means that the magnitudes of the Swespine underestimations are similar for preoperative and year-1 data, and consequently, the differences are not affected by the Swespine underestimation.

The 95% CIs for all point estimates of preoperative and year-1 are narrow (Tables 5 and 6, Fig. S5A and S5B) because of the large sample sizes. The widths of the CIs are less than 2 for all point estimates of the preoperative and year-1 ODI (Table 5) and less than 3 for all point estimates of the preoperative and year-1 NDI except for cervical disk herniations with myelopathy where the CI widths are less than 6 (smaller sample size) (Table 6). Consequently, the effect of the Swespine underestimation is unlikely to be larger than a few units.

We found no statistically significant differences in the ODI and NDI between the O1/O2 and N1/N2 algorithms, respectively (Tables 5 and 6). Based on this observation, it seems reasonable to accept mean imputation of at most 2 missing items in the ODI and NDI. The absence of imputation in the Swespine ODI and NDI algorithms has now been changed (algorithms O2 and N2 are used starting in April 2022), as well as with adjustment of historical data. Because the differences are minimal, we do not consider this to be a problem for previously published studies using the Swespine original algorithm. In addition, our recalculations of the ODI for 2 previously published studies, with at most 6532 included patients, suggest that the ODI differences are minimal. To ensure transparency, all authors who have used the original Swespine ODI/NDI algorithms will been contacted by the Swespine Office.

While there is a clear recommendation to allow at most 2 missing items for the NDI [3], the Mapi Research Trust provides no recommendation for the ODI [20]. Our data suggest that, from a Swedish perspective, there is no major difference when adjusting for 1 or 2 missing items. Nevertheless, in order to unify the management of missing items, nationally and internationally, we propose a recommendation by the Mapi Research Trust on the accepted number of missing ODI items.

Conclusion

The Swespine algorithms underestimated the ODI and NDI by approximately 1 out of 100 units compared with the original algorithms when the algorithms were applied to a large data set of real-life data. In addition, there were no statistically significant differences between the original algorithms when adjusting for at most 1 or 2 missing items. The algorithms in Swespine have now been changed (April 2022) in line with the original algorithms (O2 and N2), as well as with adjustment of historical data. It is important that studies based on patient reported outcome measures specify algorithms used for calculation of scores.