Introduction

Quality of care has been defined as: doing the right thing, at the right time, in the right way, for the right person, and having the best possible results [1]. The challenge is how to measure quality of care in daily practice. Results at the level of the patient’s health status can be measured with patient-reported outcome measures (PROMs), patient-reported experience measures (PREMs), and/or physical performance measures. PROMs are questionnaires or single-item scales measuring aspects of a patient’s health status directly reported by the patient, e.g. perceived pain. PREMs are questionnaires measuring the experience of patients with health care, e.g. communication with the health care professional. Physical performance measures are clinical tests to measure physical function, e.g. 6-min walking test. Outcome measures should be well developed and unidimensional in order to generate information regarding the construct of interest [2]. Using such combined outcomes in daily practice is proposed to facilitate the interaction between patient and health care professional, including the process of shared decision-making, goal-setting, and evaluation of treatment effects [3, 4].

Outcomes measurement can also be useful to provide transparency about the process and the intervention effect at the group level in order to facilitate quality improvement trajectories, to provide information for patients, and for pay-for-performance purposes [5,6,7]. For successful implementation of outcome measures, patients and health care professionals need to accept a common set of outcomes to be measured as having added value in daily practice. In physiotherapy practice, multiple outcome measures are being used in clinical decision-making. Routine data collection in daily practice opens the opportunity for establishing large data sets with patient outcomes. Standardization of these measurements is necessary to enable comparison of intervention effects [3, 8].

Currently, limited data are available about the quality of daily care of patients with non-specific low back pain (NSLBP) treated in physiotherapy practices in the Netherlands. Therefore, this study focuses on the development of a standard set of outcome measures for NSLBP, the most common health condition of patients visiting physiotherapy in primary care practice. The final set of measures should be accepted as having added value in clinical practice and will have to be useful to compare the outcomes at the level of the individual patient, and for measuring and improving quality of physiotherapists and their practices.

Previous international studies showed several initiatives for developing outcome sets for low back pain [9,10,11,12,13,14]. Most of these core outcome sets were developed for clinical trial purposes and have not been tested with regard to relevance and feasibility in the evaluation of quality of care in daily practice. Few studies on NSLBP in physiotherapy showed a good relationship of higher guideline adherence with better outcomes and less utilization of care [3, 15]. This stresses the value of gaining insight into outcomes on a larger scale. Successful implementation of outcome measures in daily practice can be improved by stakeholder engagement in quality improvement initiatives [16, 17]. It is therefore important to include all relevant stakeholders in Dutch physiotherapy concerning NSLBP in the development of the current standard set of outcomes.

In this study, NSLBP was defined as pain and discomfort, localized below the costal margin and above the inferior gluteal folds, with or without leg pain, and not caused by specific pathology [18]. NSLBP is an example of a patient group with high variation in level of recovery and time to recover varying from one day to multiple years. There is an increasing popularity for stratification of patients with NSLBP in subgroups, taking into account differences in characteristics based on prognostic profiles [19,20,21,22]. Outcome sets combined with stratified care will be more precise and useful for the development of quality indicators and better accepted for quality improvement [23].

The aim of this study was to develop a clinical standard set of outcome measures in patients with NSLBP—taking into account classification in clinically relevant subgroups—that is accepted for relevance and feasibility by stakeholders, and deemed useful for (a) interaction between patient and health care professional, e.g. shared decision-making in goal-setting and monitoring and feedback based on outcomes, (b) internal quality improvement, and (c) external transparency of primary care physical therapist practices.

Methods

Design

This study used a mixed method design in Dutch physiotherapy practices. The study was conducted between October 2016 and July 2017. An advisory board was formed with representatives of the Dutch Patient Association for Back Pain (NVVR), the Royal Dutch Society for Physiotherapy (KNGF), the Association for Quality in Physiotherapy (SKF) and two representatives of health insurance companies in the Netherlands (CZ & DFZ) to monitor and evaluate the process and to facilitate the implementation by providing and receiving information from stakeholder groups.

We used a consensus-driven modified RAND-UCLA Delphi method to select relevant outcomes [24] in seven separate steps (see Table 1). Informed consent was obtained from all individual participants included in the study, and all procedures were conducted according to the Declaration of Helsinki.

Table 1 Steps during the consensus-driven modified Delphi method

Step 1. Explorative review of the literature

We searched in the Guideline International Network (G-I-N) and PEDro database for outcome measures based on PROMs, PREMs and physical performance measures with adequate psychometric properties, including reliability, validity, and responsiveness [25, 26]. All multi- and mono-disciplinary Dutch and international clinical practice guidelines for physiotherapists, general practitioners and medical specialists were included. We also searched websites of organizations developing clinical practice guidelines (see ‘Appendix A’). Based on the identified guidelines, reference tracking was performed. We preferred outcome measures that were already used in daily practice in the Netherlands. We also searched for structure, process and outcome measures in existing indicator sets, see ‘Appendix B’ for the search string. We used a pragmatic explorative approach and did not aim at conducting a systematic review of the literature. In the next step we analysed all eligible measures on their validity and reliability. The following information was gathered: type of measure (process, structure, outcome), type of questionnaire/instrument, targeted patient group, content of the questionnaire/instrument, time to complete the questionnaire/instrument, the minimal clinically important difference (MCID), domain (e.g. pain), related measurements, whether the questionnaire/instrument was already translated in Dutch, and supporting literature.

We used the prognostic profiles of the KNGF guideline as primary classification for subgroups related to the course recovery based on prognostic factors [27]. Then, we used the same literature search as for the outcome measures to compare the prognostic profiles of the KNGF with multi- and mono-disciplinary Dutch and international clinical practice guidelines. If necessary, we combined useful elements of different guidelines with the profiles of the KNGF guideline. In the identified guidelines, reference tracking was performed. Additionally, the PubMed database was screened between January 2012 and December 2016 for systematic reviews about individual prognostic factors in NSLBP, see ‘Appendix B’ for the search string. The individual prognostic factors were used as addition on the search to prognostic profiles. After the screening we selected all useful factors for prognostic profiles to classify subgroups for NSLBP.

Step 2. First online survey round

We recruited 43 Dutch physiotherapists via contact networks of the KNGF and SKF for participation in the Delphi rounds; this was a purposive sample. All participants needed to have ample treatment experience in patients with NSLBP, or experience in scientific research on NSLBP, or both. The goal of this step was to rate all in step 1 selected measures with a 9-point Likert scale on relevance and feasibility. Afterwards, the participants rated the appropriateness and feasibility of prognostic profiles for NSLBP. We conducted the online survey in LimeSurvey, version 2.06.

Step 3. Expert committee

We invited four participants with complementary expertise of the online survey to join an expert committee to discuss the results of the online survey and rated the outcome set and prognostic profiles on its content validity and reliability.

Step 4. Patient interviews

We invited six patients who were treated by physiotherapists for NSLBP in the past year. The patients were recruited via a convenience sample of six physiotherapy practices. Each physiotherapy practice included one adult patient that was treated for NSLBP in the last year. When the patient agreed on the informed consent, the physiotherapist gave the contact information to the researcher. Short semi-structured telephone interviews of approximately 30 min were held by two researchers (KV and JL), and a topic list was used. The aim of the interviews was to gain insight into the patient perspective on relevance and feasibility on the use of questionnaires (PROMs and PREMs), physical performance measures, process and structure measures, and to what extent measurements can be used for improving the quality of care. The interviews were audio-recorded, transcribed verbatim and analysed using thematic analysis. KV and JL independently analysed the interviews and assigned codes within and between the interviews. Afterwards, the assigned codes were compared and grouped together in greater categories and themes. The most important themes were presented during the consensus meeting.

Step 5. Consensus meeting

All participants of the online survey were invited in a three-hour consensus meeting, together with policy makers and members of the advisory board. We used the nominal group technique (NGT) to structure the meeting [28, 29]. The NGT is specifically designed and widely used for consensus statements between experts in a certain topic [28, 29]. The different steps in the NGT helps to give all participants a voice in the consensus process [30]. During these steps the participants rated, discussed and then re-rated the eligible structure, process and outcome measures. We discussed all measures scored in the first online survey and presented the results of steps 2–4, followed by a second on-site rating of the relevance and feasibility of the measures. The measures were included in the standard set if the total votes scored 80% or higher on yes/no rating. All measures that were scored between 60% and 80% in the on-site rating were deferred for discussion in the second online survey. All measures that received between 0% and 60% of the votes were excluded.

Step 6. Second online survey round

All participants were invited for the second online round. All outcome measures that received between 60% and 80% of the votes were re-rated on relevance, and if needed, alternatives that were discussed in step 5 were rated. Based on these results, we developed a final outcome set. We concluded the set and the prognostic profiles as being accepted when all panellists rated a median of 7 or higher on a 9-point Likert scale.

Step 7. Final approval of the advisory board

Finally, the advisory board was asked to accept the final outcome set. The goal was to inform every stakeholder about the conclusions and implications of the study and to increase the acceptability. All representatives were asked to take the responsibility for communication of the results of the study in their own organization.

Results

Participants

In step 2, 32 of the 43 panellists (response rate: 70%) completed the survey. The mean age of the panellists was 42 years, and 75.8% were men. In step 3, four expert physiotherapists (FM, DH, BM, and BH) accepted to participate. During step 4, six semi-structured interviews with six patients with NSLBP were held. The age ranged from 42 to 73 years with an average age of 56; four of the participants were men. In step 5, the consensus meeting, 16 of the 43 physiotherapists and researchers participated (response rate: 37%), as well as seven policy makers and members of the advisory board. The patient representatives were not able to join at the meeting. For the second online survey in step 6, 29/43 (response rate: 68%) respondents participated. Finally, in step 7, the five members of the advisory board participated.

Step 1. Explorative review of the literature

We identified 27 measures, of which 13 were eligible for further investigation: six PROMs, two PREMs, two additional outcome measures, two process measures, and one screening tool (see Table 2).

Table 2 Result of step 2: first online survey and step 5: consensus meeting

The reasons for exclusion of the 14 measures were as follows: not familiar in the Netherlands, not developed and useful for NSLBP, not primarily advised by guidelines, and not for physiotherapy primary care purposes. For a clear description of all included measures and an overview of all excluded measures, see ‘Appendix C’.

Development of prognostic profiles

We identified 19 guidelines, of which ten described useful information about prognostic profiles for NSLBP or individual prognostic factors [21, 27, 31,32,33,34,35,36,37,38,39]. The remaining nine guidelines were focused on specific pathology. To develop prognostic profiles we focused primarily on the Dutch KNGF guideline and compared it with other guidelines [27]. The majority of the guidelines specified two or three prognostic patient profiles based on the expected time of recovery. Some guidelines did not provide prognostic profiles in a table, but described them narratively [35,36,37]. Furthermore, all guidelines described individual prognostic factors that are associated with the course of recovery. See ‘Appendix D’ for a summary of all useful components.

Based on the outcomes of the literature review, we distinguished three prognostic profiles for NSLBP (A, B, and C), all profiles containing four characteristics. These characteristics are generally based on prognostic (psychosocial) factors of the KNGF guideline. Some guidelines described the expected time of recovery in weeks; this was added to our prognostic profiles. We identified prognostic factors based on back pain-related factors, individual factors, work-related factors, and psychosocial factors. The profiles are described in ‘Appendix E’.

Step 2. First online survey round

Of the 13 outcome measures that were rated, five outcome measures scored a median of ≥ 7 on relevance and on feasibility nine outcome and process measures scored a median of ≥ 7. The prognostic profiles scored a median of ≥ 7 on relevance and feasibility; see Table 2 for a more specified overview of the survey.

Step 3. Expert committee

During the expert meeting all 13 selected outcome and process measures were discussed for validity and reliability. The experts accepted the prognostic profiles as having added value in daily practice. They made some suggestions, as to their opinion prognostic profiles are still not accurate enough to predict individuals who develop chronic pain or not. For example, prognostic factors of a patient in profile A can also be seen in profile B. Acute, sub-acute, and chronic low back pain should not be used in the prognostic profiles. The experts suggested selecting outcome measures per profile. The experts stated that it will be necessary to perform a solid pilot study to test the selected outcome and process measures on feasibility before the outcome set can be considered valid and reliable for quality improvement purposes.

Step 4. Patient interviews

The following themes were identified: (1) patient satisfaction, (2) administration, (3) number of treatment sessions, (4) transparency, and (5) PROMs. Almost every patient agreed that satisfaction about the given treatment and treatment effect is relevant for quality evaluation purposes. Clinical record keeping is important to monitor the effect of treatment, to support a colleague during takeovers, and is valuable for evaluating quality of care, but should also be short and brief. The number of treatment sessions can be useful to evaluate quality of care, depending on the patient group. Some patients stated that transparency about outcomes of care could help them to choose health care professionals, and other patients said that they preferred the advice of a doctor, therapist, or family member. There were different opinions whether PROMs were relevant for quality evaluation purposes. Most patients stated that the readability of PROMs is good. Some patients said that pain and functional problems were useful elements to score in PROMs but also psychosocial factors as this may have influence on the effect of treatment. See ‘Appendix F’ for the themes, categories, and codes.

Step 5. Consensus meeting

After presentation of the results of previous steps, the panellists could vote per measure with yes/no whether it should be added to the final outcome set; all results are presented in Table 2. The Quebec Back Pain Disability Scale (QBPDS), Numeric Pain Rating Scale (NPRS), Global Perceived Effect—Dutch Version (GPE-DV), and STarT Back Screening Tool (SBT) were chosen to include to the final outcome set directly. The Oswestry Disability Index (ODI), Patient-Specific Functional Scale (PSFS), number of treatment sessions, and prognostic profiles were scored between 60 and 80%, and the following suggestions were done during the consensus meeting for adapting the measures: the panellists suggested that the ODI is less common in the Netherlands compared to the QBPDS. Nevertheless, the ODI is widely used internationally and consists of good psychometric properties. The panellists suggested adding the ODI to the outcome set and compare the ODI with the QBPDS in a pilot study. The pilot study opens the opportunity to reflect on preferences of the field and to analyse the feasibility, acceptability, and responsiveness between both instruments. In the meantime, physiotherapists in the field can choose between the ODI and QBPDS. The panellists stated that the number of treatment sessions should be changed to measuring the total costs of the episode. The panellists suggested that the STarT Back Screening Tool (SBT) could replace the prognostic profiles.

Step 6. Second online survey round

The following four questions related to the remaining measures were rated on a 9-point Likert scale: ‘Do you agree that we use the SBT to classify patients in subgroups?’, ‘Do you agree that we add the PSFS in the outcome set?’, ‘Do you agree that the ODI will be tested in a pilot study in comparison with the QBPDS?’, and ‘Do you agree that treatment costs should be evaluated but will not be added in the outcome set?’. In the online survey we presented the final outcome set and prognostic profiles as in Table 3. All questions scored a median of 7 or higher, and therefore, the outcome set was accepted.

Table 3 Final set of measures accepted by all stakeholders

Step 7. Final approval of the advisory board

The final set with outcome measures was accepted by the advisory board. They accepted the outcome set by signing an official approval document.

Discussion

This study presents a standard set of six clinical outcome measures in patients with NSLBP in primary care physiotherapy, which includes the Quebec Back Pain Disability Scale (QBPDS), Oswestry Disability Index (ODI), Patient-Specific Functional Scale (PSFS), Numeric Pain Rating Scale (NPRS), Global Perceived Effect—Dutch Version (GPE-DV), and the STarT Back Screening Tool (SBT). The outcome measures are aimed to be used for the interaction between patient and physiotherapist, for internal quality improvement, and for external transparency.

In our study, the STarT Back Screening Tool was selected to allocate patients in subgroups, while research showed that cautiousness is required with respect to interpretation of prognostic tools [20]. Karran et al. (2017) concluded that prognostic screening instruments in primary care scored poorly at assigning higher risk scores to individuals who develop chronic pain, than those who will not [20]. However, other researches showed that identifying subgroups of patients with NSLBP is still promising for future health care [40]. Multiple researchers support this vision [20,21,22]. During a pilot study, we should test whether the STarT Back Screening Tool is reliable and valid for classifying patients in subgroups.

The fundamental difference with existing outcome sets for low back pain is that this outcome set is accepted by stakeholders as having added value in daily care [9,10,11,12,13,14]. Therefore, this new standard set provides a more promising basis for the implementation of quality indicators in clinical practice. Stakeholder engagement is essential for successful implementation of quality improvement initiatives. In comparison with traditional Delphi methods we performed additional activities to reach consensus based on the RAND-UCLA appropriateness method. Along with the anonymous online surveys and consensus meeting, we conducted an expert meeting, interviewed patients, and consulted an advisory board. With these steps, stakeholders were encouraged to use this outcome set in daily practice. In our study the patients were included to reflect on the selection of measures for the standard set and not as a separate qualitative study. The interviews were limited to six patients, and we did not reach data saturation, which could lead to bias. However, the interviews gained sufficient insights of the patients’ views of measurements in clinical practice.

During the consensus study we found that the stakeholders and physiotherapists and others showed a positive attitude about developing the standard set and its described goals. This may not be representative for the total population of physiotherapists in the Netherlands. The panellists of this study may have been early adopters and open for quality improvement or external transparency. During implementation of this standard set we will need to anticipate that physiotherapists need more information about the benefits on standardization of outcome measurements to provide insight into and to compare intervention effects. Implementation strategies may need to be aimed at knowledge, skills, and attitudes of physiotherapists.

Due to pragmatic reasons we were not able to let panellists rate the outcome measures on a 9-point Likert scale during the consensus meeting, as preferred by the RAND-UCLA method [24]. The panellists voted with yes/no. Potentially, this may have influenced the voting and panellists could feel peer pressured with the used method. We verified whether the panellists felt comfortable about the procedure, and they agreed and felt safe to give their opinion. We do not expect that the results would have been different when using a Likert scale.

Before implementation of a standard set in daily practice, it is important to develop an infrastructure for collection of the data [3, 8]. For example, the standard set must be implemented in the Electronic Health Record (EHR) that physiotherapists use for their clinical record keeping [3]. This EHR must be connected to a secure central database before the outcomes can be analysed. Also, the infrastructure must allow the possibility to give practices and physiotherapists feedback on outcomes and useful for quality improvement.

In this study we present a consensus-based standard set of outcome measures that is accepted for relevance and feasibility by stakeholders. Therefore, this standard outcome set provides a promising basis for further development of quality indicators in physiotherapy practice [41]. The standard set is currently used in daily practice and tested for validity and reliability in a pilot before it can be used for the development of quality indicators [7]. All stakeholders should stay engaged during further implementation of the standard set.