Introduction

Low back pain (LBP) is the most common and expensive musculoskeletal disorder in Western countries [1]. The recovery process of persons with chronic LBP is slow, and their demands on the health care system are both large and costly. Costs due to work disability and production losses are even higher. These and medical costs in the United States (US), United Kingdom (UK) and the Netherlands were estimated yearly to be US $50 billion [2], US $11 billion [3] and US $5 billion [4], respectively. For preventive reasons it is needed to identify which patients are responsible for long-term disability, production losses and sick-leave. If these high risk patients can be identified preventive actions can be taken by health care providers. This may reduce costs and is therefore of benefit from both a medical and an economical perspective.

Until now a number of systematic reviews has been conducted to determine which variables are related to the development of chronic prolonged functional disability and/or sick-leave [58]. The reviews of Crook et al. [5] and Steenstra et al. [6] both reported strong to moderate evidence for the following factors, previous episodes of LBP, higher levels of disability, older age and female gender. The reviews of Linton [7] and Hartvigsen et al. [8] studied the influence of psychosocial factors at work. Linton [7] found evidence for factors as paincoping and fear avoidance and Hartvigsen et al. [8] found evidence of no association for social support. Inconsistent findings between reviews were found for the variables radiating pain, job satisfaction, and heavier work. Obvious from these reviews is that many studies have been conducted with as aim to examine the prognosis of functional disability and/or sick-leave and that they do not agree on relevant factors. Therefore, it is still unclear which of these factors separately or in combination, contribute most to prolonged sick-leave and disability due to LBP studies on RTW and functional disability need compehensive multivariate models and have to focus more on the feasability of using this evidence for clinical practice.

Clinical prediction rules seem attractive to discriminate between high and low risk groups of patients with a poor outcome, which may support treatment decisions [9]. McGinn et al. [10] have defined four levels of evidence to develop good clinical prediction rules. Level 4 is the lowest level, i.e., the prediction rule is derived and validated in the original but not in another sample. Level 3 indicates that the prediction rule is validated in another small sample. For Level 2 it is necessary that the prediction rule has shown to be accurate in a large sample or that it has been validated in several other different settings. Level 1, i.e., the highest level, indicates that the use of the prediction rule lead to a change in clinician behaviour and improvement of patient outcomes (impact analysis). To successfully move through these levels, a prediction rule must preferably be developed by using an outcome measure that is of relevance for the clinical setting and must consist of predictors that can easily be assessed in daily practice. Furthermore, the predictive ability in terms of explained variance, discrimination and calibration has to be tested as well as the generalizability in new patients. However, these aspects of a prediction rule are often not well established [9].

The purpose of our study is to compose a clinical prediction rule in a large dataset of employees on sick-leave due to LBP that determines the risk of prolonged sick-leave of more than 6 months by using information from a wide range of demographic, work, LBP and psychosocial related factors Also the predictive ability in terms of the explained variance, discrimination and calibration of this rule will be validated internally.

Methods

Study Design

Prospective cohort study with merged data (628 patients) from three different RCTs on LBP. The first trial investigated the effectiveness of a behaviorally oriented graded activity program in comparison to usual care (134 patients) [11]. The second trial compared a workplace intervention and graded activity to usual care (195 patients) [12]. The third trial compared high and low intensity back schools with usual care (299 patients) [13].

Study Population

Study patients visited their occupational physician (OP) at one of participating occupational health services (OHS) when they were on sick-leave for not more than 8 weeks. All studies had a follow-up of at least 1 year. The population consisted of blue and white-collar workers covering a broad range of professions. Patients were eligible for participation if they met the following inclusion criteria: non-specific LBP, defined as pain localised in the lower back without a specific underlying cause; sick-listed due to LBP (completely or partially) for not more than 8 weeks; between 18 and 65 years of age; ability to complete questionnaires written in Dutch. Exclusion criteria were: specific cause of the LBP; pregnancy; serious psychiatric disorders; juridical conflict at work. The study protocol was approved by the Medical Ethics Committee of the VU University Medical Center in Amsterdam and patients who met the eligibility criteria and who were willing to participate signed an informed consent form.

Outcome

The outcome variable lasting return to work (RTW) was defined as the duration of work absenteeism in calendar days from the first day of sick-leave until full return to own work or other work with equal earnings for at least 4 weeks. Next to this variable, a status variable was defined. On this variable, participants were assigned a “1” when they fulfilled the RTW outcome variable and were assigned a “0”, indicating “censoring” when they were on sick-leave at 6 months follow-up. People that went back on sick-leave before the 4 weeks period ended, until the end of the study period, were also classified as being on sick-leave. The sick-leave data were collected continuously from the electronic medical records of the OHS [14].

Selection of Prognostic Factors

The selection of relevant prognostic factors was performed in two steps. First, the literature on prognosis for sub-acute LBP and sick-leave was reviewed [58]. On basis of this and the prognostic factors measured in the three RCTs we composed a list of prognostic factors. Second, we presented this list of potential prognostic factors by e-mail to 42 occupational physicians (OPs) working in different sectors. They were asked to judge, according to their expert opinion, whether each indicator would contribute to the development of chronic LBP. We also asked them their opinion on whether the factors were modifiable by treatment. The indicators: smoking, body weight and height, and years working in current job were not considered relevant after the literature review and the judgment of the OPs, so we excluded these. The prognostic factors were assessed by means of self-reported questionnaires before inclusion in the studies.

Potential Prognostic Factors

The following factors were considered important: age (years), gender (male/female), duration of complaints prior to randomization (in weeks), radiation to one or both legs (yes/no) and treatment during study enrolment (yes/no). Patients were also asked to rate the level of certainty that they would be working fulltime in 6 months time (“self-predicted certainty”). For this prognostic factor they answered the question “How certain are you about full work resumption at 6 months” using a 5-point scale: “completely uncertain”, “a little certain”, “somewhat certain”, “certain”, “completely certain”. Physical activity was measured with the Baecke questionnaire [15]. Pain intensity and functional status at baseline were scored by using a numerical visual analogue scale (VAS, range 0–10) [16] and the roland disability questionnaire (RDQ, range 0–24) [17], respectively. Potential job-related physical factors were measured by the section ‘Musculoskeletal Workload’ of the dutch musculoskeletal questionnaire (DMQ) [18]. The physical factors we examined were daily: lifting, bending and twisting of the trunk, driving a vehicle at work (whole body vibration), and stooping consisting of the answer categories: “never”, “sometimes”, “quite frequently”, “much frequently”. The DMQ is used to identify high and low risk groups for musculoskeletal disorders like LBP and has fair validity. Potential work-related psychosocial factors were measured by means of a Dutch version of Karasek’s Job Content Questionnaire (JCQ). Dimensions of this questionnaire are: quantitative job demands, decision authority, skill discretion, supervisor support and co-worker support [19]. According to Karasek’s model, quantitative job demands were called job demands, decision authority and skill discretion were merged into job control and supervisor and co-worker support were merged into social support. Job satisfaction was assessed by means of a question concerning job task satisfaction consisting of the answer categories: “no good”, “reasonable”, “moderate” and “good”. The Dutch version of the tampa scale for kinesiophobia (TSK, range 17–68) was used to measure the extent to which people feared that exercise can lead to reinjury [20]. A high score indicates much fear for physical activity or injury. Fear of movement, avoidance of activities and back pain beliefs were measured with the Fear Avoidance Beliefs Questionnaire (range 0–42) [21]. Coping with pain was measured with the pain coping inventory (PCI) questionnaire. The PCI questionnaire measures cognitive and behavioural coping strategies of pain patients. The questionnaire consists of six subscales: transformation of pain, distraction, lowering demands, withdrawal, worrying, and resting [22]. The first three subscales were merged to obtain a measure of active paincoping and the latter three subscales were merged to obtain a passive paincoping measure, respectively.

Analyses

Model Building Process

The Cox proportional hazard model was used to examine the prognostic value of each factor. In each Cox model RTW was fitted as the dependent variable and the prognostic factors as the independent variables. We adjusted for the effects of the interventions. The Cox proportional hazard assumption was affirmed by plotting log minus log survival curves.

To fill in variables with missing values, we applied multiple imputation by using the multiple imputation by chained equation (MICE) package [23]. This is a flexible imputation method, which allows one to specify the multivariate structure in the data as a series of conditional imputation models based on the information of other variables. We generated ten multiple imputed data sets.

Variable and model selection in these ten data sets was performed by a two-stage bootstrap modelling approach [24]. During the first stage backward regression, with a P value selection criterion of 0.157 [25], was applied on 200 bootstrap samples. The bootstrap samples were drawn with replacement from the imputed data sets and were of equal sample size as this original sample. On basis of the selection frequencies in these bootstrap samples prognostic factors that were selected in more than 50% of the regression models were included at the second modelling stage [24]. As a sensitivity analyses we also included factors that were selected in more than 40% of the models. In the second stage (500 bootstrap samples) again backward regression was performed on each of the bootstrap samples with the same P-value selection criterion as in stage one. Now, the frequency of selected models was calculated by using the factors selected in the first stage.

The Cox regression coefficients, standard errors and 95% confidence intervals of the final model were estimated over the ten imputed data sets according to Rubin’s rules [26]. During the modeling process we also considered the balance between the number of variables and events/non-events in the models, which is recommended not to be lower than 10–15 events per variable [27].

To make risk prediction available in daily practice, we transformed the final model into a clinical prediction rule.

Derivation of the Clinical Prediction Rule

  1. 1.

    We calculated the regression coefficients of the Cox model for RTW over a period of 12 months follow-up and used these coefficients to calculate risk scores of prolonged sick-leave for more than 6 months follow-up. We choose for this procedure because regression coefficients that are calculated over a period of 12 months follow-up will provide more accurate estimates of RTW than those calculated over a period of 6 months follow-up. These regression coefficients will be re-used in step 3.

  2. 2.

    The survival function under the Cox model is defined as: S(t) = S 0(t)exp(XB), where S(t) stands for each patient’s probability of prolonged sick-leave, S 0(t) is the time-dependent risk of prolonged sick-leave when all predictor variables are zero, and XB is the linear equation of the predictor variables (X) and the regression weights (B) [28].

  3. 3.

    The survival function can be rewritten into: S 0(t) = S(t)1/exp(XB). Within this formula the regression coefficients of step 1 are used to calculate the baseline risk of prolonged sick-leave for each patient according to his or her own time of follow-up.

  4. 4.

    Subsequently, the risk of prolonged sick-leave at 6 months could be identified from the patients with a follow-up time of 6 months.

  5. 5.

    Now the survival function was again applied. For S 0(t) the risk of prolonged sick-leave at 6 months (obtained at step 4) was used for all patients. Only the values for the predictor variables (X) and the regression weights (B) of each patient were responsible for the differences in risk of prolonged sick-leave. Thus the following formula was used here: S(6 months) = S 0(6 months)exp(XB).

  6. 6.

    To derive practical interpretable scores for the prediction rule, the Cox regression coefficients of step 1 were multiplied by 10 and rounded to the nearest integer. These points can be multiplied by the values for the predictors and added up for each patient to calculate the total risk score on prolonged sick-leave.

Performance of the Prediction Rule

Discrimination was calculated by the c-index. The c-index equals the area under the receiver operating characteristic (ROC) curve in logistic regression [29]. Calibration refers to the agreement between the observed probabilities in the original data and the predicted probabilities of the prediction rule. To plot the calibration curve, the predicted and observed probabilities were presented on a scatter diagram. To determine the observed probabilities, the Kaplan–Meier estimate of the 6-months risk of not having returned to work was calculated for each decile of predicted risk [29]. The explained variance was also calculated, which is an indication of how much of the variance between patients in the outcome can be explained by the predictors [30]. We used bootstrapping to correct the regression coefficients, the c-index and the explained variance for over-optimism. Over-optimism is determined by the slope of the linear predictor, which was calculated in 200 bootstrap samples and tested in the original sample. The average difference reflects the amount of overfitting present and was used as a shrinkage factor to preshrink the regression coefficients. The model performance indices were calculated on each imputed data set and averaged over the ten imputed data sets.

Software

The MICE as well as the backward selection procedures were performed with R software [23, 27].

Results

Table 1 gives an overview of the mean values and proportions of the potential prognostic factors and of the % of missing values. Most of the missings occurred because some variables were included in only 2 of the 3 RCTs. After 12 months 577 patients had returned to work (92%), with follow-up times ranging from 4 to 364 days.

Table 1 Patient characteristics at baseline (n = 628) and the percentage of missing information for the potential prognostic factors of sick-leave

The inclusion frequencies of the factors at the first bootstrap selection stage ranged from 9.2 to 99.1% (Table 2). The indicator job satisfaction had the highest inclusion frequency. For bootstrap stage two, six factors with inclusion frequencies of more than 50% were included and with these six factors (plus the treatment factor), 46 different models were selected. As shown by the model with the highest selection frequency, longer work absence is related to job satisfaction, daily stooping, fear avoidance beliefs, pain intensity at baseline, duration of complaints and gender. This model was selected 41.6% of the time. We compared this model with the model obtained with factors selected at least 40% of the time. With this criterion, the final model included ten factors, compared to the six factors with the “50%” rule. The final model with ten factors was chosen much more infrequently between the stage two bootstrap samples—only 9.3% of the time—and it showed equal performance with regard to calibration and discrimination as the six factor model (data not shown). Therefore, we opted for the more parsimonious “50%” model as our best model.

Table 2 Selection frequencies of variables at step 1 and models at step 2 as a result of the two-step bootstrap model averaging (BMA) approach

With this model a clinical prediction rule was derived, which is presented in Table 3. In this prediction rule the variable daily exposure to stooping, unexpectingly, contributed to a lower risk of prolonged sick-leave. We tested the performance of the prediction rule when we in- and excluded this variable and these models performed equally well (data not shown). The prediction rule is therefore presented without the variable daily exposure to stooping. Table 3 shows that longer work absence is related to moderate to poor job satisfaction, a higher score for fear avoidance beliefs, higher pain intensity at baseline, a longer duration of complaints and being of female gender.

Table 3 Factors included in the multivariable model together with the index for discrimination (c-index) and calibration (slope)

With the risk scores presented in Table 3, the risk of developing prolonged sick-leave of more than 6 months can be calculated for each individual patient. For example, a male LBP patient, who experiences moderate job satisfaction, is not extremely fear avoidant (score of 15), has a low pain intensity score (score of 2), and a long duration of complaints (8 weeks) will have a total risk score of −2 (gender) + 2 (job satisfaction) + 3 (fear avoidance) + 1.2 (pain intensity) + 0.08 (duration of complaints) = 4.3. This patient will have a risk of developing prolonged sick-leave of more than 6 months of 12% (last column of Table 4).

Table 4 Score categories, associated observed and predicted risk score probabilities of prolonged sick-leave and test characteristics of prediction rule at 6 months (%)

The bootstrap corrected explained variance of the model was 6% (Table 3). This indicates that 6% of the variance between patients in the outcome can be explained by the predictors in the model. The bootstrap corrected c-index was 0.63 and indicates that in 63% of the patients the prediction rule discriminates well between a high and a low risk patient to develop long-term sick-leave. The calibration slope was 0.9. This indicates that a little overoptimism is to be expected when applying the prediction rule in new workers with LBP.

The score of categories and their related mean absence days and observed and predicted probabilities of prolonged sick-leave at 6 months is presented in Table 4. Also presented in this table are the test characteristics in terms of sensitivity, specificity and positive and negative predictive values at different risk score categories. The mean observed probability of sick-leave at 6 months averaged over all score categories was 18.5%. Overall, the observed probability was almost similar to the predicted probability. This pattern can also be seen in the calibration curve in Fig. 1. In general, the pairs of predicted and observed probabilities are near the ideal line of perfect calibration.

Fig. 1
figure 1

Calibration curve of the prediction rule of the risk of prolonged sick-leave at 6 months (shrunken regression coefficients). The dotted line indicates perfect calibration

The sensitivity and specificity of the rule are moderate to low. If a cut-off level of ≥10 is chosen as an indication of a high risk of prolonged sick-leave then 32% of the patients who actually progressed to prolonged sick-leave are correctly identified (sensitivity). From the patients with a score of <10, i.e., the group with a low risk of prolonged sick-leave, 89% who does not develop prolonged sick-leave is correctly classified (specificity). More important for clinical practice are the predictive values of the rule. The negative predictive value (NPV) of 84% in the patients with a low risk score of <10 means that these patients will not receive an intervention, and this is correct. However, 16% (1-NPV) of these patients will develop prolonged sick leave but are not targeted for an intervention. The positive predictive value (PPV) of the rule at the score level of ≥10 is 41%. This means that in this high risk group of patients an intervention is justified in 41% of the cases because they actually will develop prolonged sick-leave. However, this also means that in 59% (1-PPV) of these high risk patients an intervention is applied, while they will not develop prolonged sick-leave.

Discussion

In this study we developed a model for the prediction of prolonged work absence at 6 months follow-up in a cohort of workers on sick-leave due to LBP by the use of a broad spectrum of demographic, work, LBP and psychosocial related prognostic factors. We identified that moderate to poor job satisfaction, a higher score of fear avoidance beliefs, higher pain intensity at baseline, a longer duration of complaints and female gender were associated with a higher risk of not returning to work at 6 months.

Comparison with Findings in the Literature

In our study “moderate” and “poor” job satisfaction was found to be responsible for a longer time until RTW. Job satisfaction in relation to RTW is part of the evidence that work-related psychosocial characteristics might be relevant factors for workers to maintain work [8]. In a recent review it was concluded that there is strong evidence that job satisfaction is not associated with a longer time off work [6]. Adding our study to this review would result in conflicting evidence of job satisfaction as an important factor associated with sick-leave due to LBP. This indicates that job satisfaction might influence the course of disabling LBP and that evidence of this factor has to be confirmed.

There is a lack of prognostic studies in occupational health care that examine associations between psychological factors, like fear avoidance beliefs, and RTW [31]. Psychological beliefs have been identified as important factors in relation to the course of LBP and accompanying sick-leave [32]. The effect of fear avoidance beliefs on prolonged sick-leave in our study was small. A more pronounced relationship of fear-avoidance with future LBP work absence was shown by Fritz et al. [33]. Although the role of psychosocial beliefs in association with RTW is inconclusive [7], the presence of this association in the context of other potential prognostic factors in an occupational setting needs further attention.

A higher level of pain intensity at baseline and a longer duration of complaints at study inclusion have been identified as relevant prognostic factors in several studies before [32, 34]. The duration of complaints can be related to the severity of the LBP, which in turn may be responsible for a longer work absence. Von Korff et al. [34] stated that pain intensity is related to pain severity and limitations in functioning and work. Von Korff et al. showed that back pain does not have to be present all the time. The pain may be present in the background for a longer time at a lower level of pain intensity and may flare-up. Flare-ups are frequently seen in chronic LBP patients and are in combination with higher levels of functional limitations, like problems with work activities, responsible for higher pain severity and consequently work absence [34].

In the current study gender, i.e., female sex, was associated with a higher risk of sick-leave at 6 months follow-up. The review of Steenstra et al. [6] showed that female gender is associated with a longer time off work. An explanation of this gender difference on sick-leave duration can be found in the vulnerability hypothesis. This hypothesis states that due to differences in biological (e.g., hormone physiology) or psychological factors (e.g., coping strategies) similar exposures at work might have a larger negative effect on women than on men [35].

In our study, a more frequent exposure to daily stooping did not contribute to a higher risk of prolonged sick-leave. Daily exposure to stooping is an aspect of the high work load that workers experience if they are frequently exposed to it. This factor was associated with more sickness absence due to LBP in an etiological study [36]. However, the role of this factor on the prognosis of sick-leave in LBP patients is less obvious. Some prognostic studies did find an effect of high workload on longer sick-leave [37, 38], but others did not [39, 40]. We tested in the current study if the inclusion and exclusion of the variable stooping was responsible for a change in the estimates of the regression coefficients of other variables in the final model or if it caused a change in the model performance. In both test situations similar results were produced. Therefore, we choose to report the model without the inclusion of the variable daily exposure to stooping.

The explained variance of our prediction rule was 6%. In the study of Pransky et al. [41] an explained variance of 12% was reported. Dionne et al. [42] did not present a value for the explained variance. The explained variance of the prediction model in the study of Heymans et al. [43] was, 23.7%. Other studies that developed prognostic models for RTW in an occupational setting reported explained variances of 18–30% [39, 44]. Obvious is that the values of the explained variance strongly differ between studies but that they are not high in general. A low value for the explained variance means that prognostic factors can only explain a small fraction of the variance between individual patients. We still might have missed variables that may play a role in the complex environment of occupational health care and that may influence the prognosis of sick-leave, e.g., the maintenance of contact with the employer during the sick-leave period turned to be important in a recent study [45]. It also has to be noted that the explained variance of a Cox regression model is low in general even if there are strong and highly significant predictors in the model [46].

With respect to calibration, the slope index of our prediction model was 0.90. This means that the observed and predicted probabilities are well in agreement. The bootstrap-corrected c-index of our model of 0.63 was moderate. We found three similar prediction rules for LBP patients in an occupational setting that used the outcome measure RTW. Dionne et al. [42] developed a prediction tool to identify workers at high risk of adverse occupational outcomes. Pransky et al. [41] developed a practical screening model to predict length of disability after acute occupational LBP. Heymans et al. [43] developed a nomogram to predict work status in chronic LBP patients. Their nomogram had presented a slope index of 0.91 and a c-index of 0.76. Both slightly higher than in our current study. The studies of Pransky et al. [41] and Dionne et al. [42] did not report on calibration or discrimination of their prediction tools. Pransky et al. also did not report on PPV or NPV. Dionne et al. reported PPVs of 33–57% and NPVs for their model of 74–91%. The study of Heymans et al. [43] reported PPVs in the range of 70–95% and NPVs in the range of 33–100%. With respect to the practical implication of our prediction rule we are aware that prudence has to be taken when using the prediction rule in practice. Considering an adequate cut point, at the score level of 2 the NPV is 98%, which means that the majority of patients with a score lower than 2 will not receive an intervention, which is correct. At the score levels of 6 and 10 the NPV of 89 and 84%, respectively are still high. The PPV of 41% at the score level of 10 is moderate, which means that 41% of the patients with a score of 10 and higher (i.e., patients with a poor prognosis) correctly receive an intervention. However, 59% (100%-PPV) of the patients with a score of 10 and higher receive an intervention despite their good prognosis. This may lead to misusing health care resources, high treatment costs, potential for iatrogenesis or that sick-leave duration takes longer than expected because patients think that treatments should be completed before full RTW is reached. However, from an employers’ perspective it seems more attractive to refer each patient with a score of 10 or higher to a specific treatment to stimulate work ability and work resumption.

Remarks on Our Study: Choices in the Data Analysis

In our prediction model we adjusted for the treatment effects that were examined in each RCT that delivered the patient data. Therefore, treatment effects may have influenced prediction of RTW. However, the treatment effect of usual care [1113], and low and high intensity back schools were small [13]. The graded activity intervention showed a beneficial effect on RTW in one RCT [11] and an opposite effect on RTW in another study [12]. These interventions will therefore have a small impact on prediction of RTW. A workplace intervention proved to stimulate RTW [12]. The prognostic score of a patient may slightly improve when he will be referred to this intervention. Whether the risk scores obtained by our prediction rule can be improved in new patients by specific interventions has to be confirmed in a future RCT. We therefore present our prediction rule without involvement of the treatment variable.

Study Strengths and Limitations

The strength of our study is that we were able to include almost all variables that are mentioned in the literature as prognostic factors. Moreover, the selection of relevant predictors was not only based on evidence in the literature but also on clinical expertise of OPs. This may enhance clinical applicability of the prediction rule. According to the Dutch occupational health care guidelines, workers have to visit their OP when they are on sick-leave for not more than 8 weeks due to LBP, which was also the case in our study. Therefore, our study results are generalizable to Dutch occupational health care practice.

The success of using clinical prediction rules in practice depends among other things on how much time it will take for the patient and/or clinician to determine the final risk score and probability of outcome. Our prediction rule consists of five variables that can easily be answered by the patient. Most variables consist of one question only, i.e., job satisfaction, pain intensity, duration of complaints and gender. For the variable fear avoidance beliefs 16 questions have to be answered, which makes a total of 20 questions. The prediction rule can easily be offered via a desktop computer, in an Excel format or as a web application. In this form it will take 5–10 min to fill in the prediction rule and might it be feasible to administer the rule on a routine basis just before or during the appointment with the clinician.

A limitation of our study is that we did not test the generalizability of the prediction rule in new similar patients (external validation) [29]. Even after correction for optimism the performance of the model may decrease due to other population characteristics. The need for external validation after adjusting for optimism was for example reported in the study of Bleeker et al. [47]. However, there are good indications that internal validation by using bootstrap-corrected indices of discrimination produce estimates that can also be expected in future patients [48].

A limitation of our study is the low explained variation and moderate clinical performance of the prediction rule. This means that not each individual risk profile might be accurate enough so that it can be used in practice to guide treatment decisions. As a consequence prudence has to be taken when using the prediction rule in practice.

Conclusion

Our study confirmed the importance of demographic, work, LBP, and psychosocial related factors on the prediction of long-term sick-leave. These factors were used to derive a clinical prediction rule to make risk prediction of prolonged sick-leave of more than 6 months due to LBP. By taking a cut-off point of the risk score of larger than ten, the NPV is high, thereby preventing treatment in persons at a low risk of prolonged sick-leave. However, the performance of the prediction is only moderate and the explained variance is low.