Self- reported prognostic factors in adults reporting neck or low back pain: An umbrella review

Background: Numerous systematic reviews have attempted to synthesize evidence on prognostic factors for predicting future outcomes such as pain, disability and return- to- work/work absence in neck and low back pain populations. Databases and datatreatment: An umbrella review of systematic reviews was conducted to summarize the magnitude and quality of the evidence for each prognostic factor investigated. Searches were limited to the last 10 years (2008- 11th April 2018, updated 28th

Patients with NLBP are a heterogeneous group where the prognosis, and factors associated with likely future outcome, vary substantially between individuals. This means that a tailored approach to informing patients about their prognosis, and making individualized decisions regarding optimal management may be more beneficial than a one-size fits-all approach (Foster et al., 2013). However, evidence to support tailoring of self-management and treatment to the needs of the individual is still limited. Health professionals often lack appropriate prognostic information to support optimal decisions regarding management, referral and monitoring of symptoms (Saragiotto et al., 2016). Numerous cohort studies investigating clinical and psychosocial prognostic factors have been conducted, and have been summarized across a large number of systematic reviews, making it difficult to access evidence regarding prognostic information. Furthermore, only a few reviews have identified generic prognostic factors that are relevant across different musculoskeletal conditions, including neck and back pain (e.g. De Vos Andersen et al., 2017;Kamper et al., 2008).
Overviews of systematic reviews, or umbrella reviews, are a way of bringing together the evidence from numerous systematic reviews on a similar topic (Aromataris et al., 2015;Smith et al., 2011). The process uses similar methods to traditional systematic reviews, but has systematic reviews themselves as the unit of analysis, rather than single studies (Smith et al., 2011). As the number of published umbrella reviews has increased, guidance on their conduct and reporting has also been developed (Aromataris et al., 2015;Smith et al., 2011). Given the large body of evidence in the area of prognosis reviews for NLBP, we conducted an umbrella review, making sure that we captured as much evidence as possible while taking into account the quality of evidence and the risk of bias in identified systematic reviews.
This review is part of a wider EU-funded project (Back-UP) that aims to develop a cloud-based computer platform for patients, clinicians and occupational health providers, which will generate prognostic information regarding NLBP based on information received from patients. This information will be visually presented as individual predicted scores (over 6-months) for three outcome domains: pain, function and return-to-work/work absence. In addition, patients and their clinicians will receive information whether there a low, medium or high risk of persistent pain and disability and be provided with a list of recommended matched treatment options for each risk group to inform shared treatment and referral decisions. To help with the development of this platform, an evidence synthesis was needed to identify self-reported psychological, social, work and clinical factors most likely to predict future pain, physical function and return-to-work/ work absence outcomes in NLBP populations, and to summarize the quality of evidence for these factors.
Objective: To summarize evidence for self-reported prognostic factors predictive of disability, pain and/or return-towork/work absence outcomes in patients presenting with low back or neck pain in ambulant or occupational healthcare.

| Eligibility criteria
We included systematic reviews and overviews of systematic reviews (umbrella reviews) which reported results for adult populations (18 years or older) with neck and/or low back pain of any duration, including whiplash, sciatica and radiculopathies, in any occupational or ambulant healthcare setting. Our definition of a systematic review was of a review which carried out a systematic search of at least one electronic database, included critical appraisal of the primary prognostic factor studies, and synthesized the results. Eligible prognostic factors were any self-reported psychological, social or clinical variable; eligible outcomes were any measure of pain, functional disability and/or return-to-work/work absence. Systematic reviews were excluded if they focused on NLBP following severe trauma (fracture, spinal cord injury), on populations admitted to hospital (e.g. surgical populations), or only included prognostic factors that could not be selfreported by the patient. We also excluded reviews that: did not meet the definition of a systematic review stated above; only addressed the methodology of prognosis studies; did not present any information, statistical or narrative, on the strength of association between prognostic factors and outcome; scientific meeting abstracts for which the full paper could not be obtained; and those for which translations could not be obtained.

| Search strategy and Study selection
A comprehensive search strategy was developed by an information specialist (NC) to identify eligible papers. Four electronic bibliographic databases were searched: MEDLINE (OVID), EMBASE (OVID), CINAHL-Plus (EBSCO) and PsycINFO (EBSCO). Searches were limited to the last 10 years (2008-11th April 2018) in order to pick up the most recent systematic reviews and because it was considered unlikely that these reviews would miss relevant prognostic factor studies published prior to 2008. An update was carried out on 28th September 2020. The search strategy used subject headings and free text searching, combining terms for back or neck pain, prognosis and systematic reviews (see Appendix S2 for OVID MEDLINE search).
The results of all searches were downloaded into EndNote X9 (reference management software, Clarivate Analytics. Available at https://endno te.com/) for title screening. A checklist of the eligibility criteria was used for reference to aid the screening process. GM screened all titles, but the first 100 titles were also screened by a second reviewer (NC) to check for agreement. Once all titles had been screened to remove those clearly irrelevant to the review, the remaining articles were moved to Covidence (Veritas Health Innovation, Melbourne, Australia. Available at https://www.covid ence.org/) for abstract screening. Agreement was 85% for these first 100 titles, and after discussion further refinement of the inclusion and exclusion criteria were made. A combination of two reviewers (NC, GM, SS, DvdW) screened all abstracts independently, with any disagreements resolved through discussion. Full texts were independently screened by two reviewers (NC, GM), with a third reviewer (DvdW) being consulted in the case of disagreements. Reasons for exclusion of a reference at the full text stage were recorded.

| Data extraction
Data extraction was conducted in two stages onto a standardized Microsoft Excel data extraction spreadsheet. One reviewer extracted data for a set of papers and this was independently checked by a second reviewer, with GM, NC, GW-J, SS and DvdW contributing to this process. In stage 1, data were extracted regarding: healthcare setting of studies included in the review; search dates; number of studies included; study design (cohort, RCT); characteristics of study populations (pain location, diagnosis, age); prognostic factors identified by the review; and strength of evidence for the factor. Where no values were given for the strength of association, the verbatim author conclusions were recorded. Extracted data were summarized for each prognostic factor, describing the proportion of reviews which found prognostic factor to have an important and/or statistically significant association with outcome. In order to identify prognostic factors which had consistent evidence for their association with an outcome of interest, we a-priori set a threshold of the need for more than one systematic review and 50% or more of the reviews within them finding an association, for a prognostic factor to be taken forward to stage 2.
Due to variability in the definition of prognostic factors across reviews, synonymous terms were grouped into categories (e.g. the factors education, income/income level and socioeconomic status (SES) were labelled as an 'education/ SES' category), and where necessary further sub-categorized (e.g. the factors life quality, QoL and general well-being were all labelled as a 'well-being' sub-category) for stage 2 of our synthesis. Prognostic factors were examined at category level to identify the number of reviews that had investigated that factor, and out of those reviews how many had found that at least one factor within a particular category to have an association with at least one of our outcomes of interest. If the overall category did not meet the criteria given above but a "sub-category" of factors did, then the sub-category was included in Stage 2. The results of this process were tabled (see Table S1).
In stage 2, more detailed information was extracted on the measures used to collect the prognostic factors and outcomes, and whether or not a meta-analysis had been conducted. If a meta-analysis had been conducted, details were extracted about the number of studies included, total sample size used for the meta-analysis, follow-up time points, whether the meta-analysis was based on a fixed or random effects model, whether the meta-analysis was based on estimates from univariable or multivariable models, and the strength of the prognostic factor effect (summary estimate and 95% confidence interval). If no meta-analysis was performed, information was extracted regarding the methods used for a narrative evidence synthesis (e.g. whether or not this only took into account statistical significance of associations between prognostic factor and outcome, or also strength of the association, consistency of effects, risk of bias, etc.).
Data extraction forms for stage 1 and stage 2 were developed, pilot-tested and discussed within the study team to ensure all items of interest were collected within the forms, and to optimize consistency of data extraction.

| Risk of bias
As we preferred an approach to assessing risk of bias across different domains, which also addresses applicability of included reviews to the question of the umbrella review, we used the ROBIS (Risk of Bias in Systematic reviews; Whiting et al., 2016) tool for risk of bias assessment, which has been shown to have adequate measurement properties (Buhn et al., 2017;Pieper et al., 2019] The ROBIS tool was used to assess each systematic review included in stage 2 of our evidence synthesis. This second stage focused on further synthesis and grading of evidence for prognostic factors that had consistently been shown to be associated with an outcome of interest across multiple reviews. Assessment of risk of bias was therefore necessary for all systematic reviews investigating these prognostic factors (regardless of their results), but not for systematic reviews that only included factors without consistent evidence, which were not considered in stage 2.
The ROBIS tool covers four domains (study eligibility criteria, identification and selection of studies, data collection and study appraisal, and synthesis and findings). Each domain contains up to six questions, and a summary (Low, High or Unclear risk of bias) is given for each domain. A judgement is then made about the overall risk of bias, based on the findings from each domain. The guidance states that if all four domains were assessed as Low risk, then the study can be considered to have an overall low risk of bias. If one or more domain was assessed as High or Unclear risk, then reviewers were asked to consider particular areas of concern that could affect the overall judgement. We considered that for this particular review, which is aimed at identifying evidence for prognostic factors in NLBP, a key area was the quality of the search strategy, contained within the second domain of identification and selection of studies. If the search strategy had been judged as inadequate, a high risk of bias was given overall.
Additionally, ROBIS assesses reviews regarding their relevance (applicability) for the review question, and whether or not the review authors mainly based their synthesis and conclusions on statistical significance. Results of the risk of bias assessment were tabled for each study, and the distribution of low, high and unclear risk of bias was graphically presented for each domain.

| Summary measures
For each prognostic factor, where available, the narrative conclusions of each review investigating this factor were presented. For those reviews that had conducted a meta-analysis, details were presented regarding the results of the metaanalysis, including summary estimates (with 95% confidence limits) presenting the strength of association.

| Synthesis of results
A narrative synthesis was performed in order to provide an overview of the magnitude and quality of evidence for each prognostic factor carried forward to stage 2. A list of the prognostic factors identified by the review authors as having a strong association with any of the outcomes of interest (pain, disability, or return-to-work/work absence) was collated.
Similar to the approach taken by Walton, Carroll, et al., 2013) in their umbrella review, both the age of the included review and the risk of bias assessment was taken into account when synthesizing the results. Greater weight was given to more recent reviews (published since 2015), and to those which were rated as having a low or moderate overall risk of bias. The use of more recent reviews also limited the impact of double counting evidence from prognostic factor studies included in multiple reviews (Walton, Carroll, et al., 2013).
The GRADE approach (Hayden et al., 2014) was used to grade confidence in the evidence for each of the prognostic factors. Following the methods and criteria proposed by Walton, Carroll, et al., 2013), high confidence was given to those prognostic factors for which consistent high-quality evidence was presented with at least one high quality SR (low RoB) and no conflicting SRs. Moderate confidence was given to consistent findings from at least one recent medium-quality SR (moderate RoB), with the majority of findings from other concurrent SRs (where applicable) in the same direction of effect. Low confidence was given to a predictor when summary findings were of low or unclear RoB from the majority of SRs and with conflicting results, or when only a single SR reported significant but only moderate-level findings for that predictor. Very low confidence was given when none of the above conditions were met.

| Study characteristics
The characteristics of reviews included in Stage 1 are given in Table 1, with further detail available in Table S2. Ninety prognostic factors were identified across 41 reviews which included a mix of the specified outcomes of pain, disability and/or return-to-work/work absence. Of the 41 reviews, 27 focused on LBP only, 10 focused on neck pain only, and four reviews included both neck and LBP populations. Thirteen reviews looked at three prognostic factors or fewer; 10 investigated between four and 10 prognostic factors; and 18 reviews investigated more than 10 prognostic factors.
For a total of 25 prognostic factors or categories of factors, consistent evidence was found for an association with an outcome of interest. We defined consistent evidence as more than one systematic review investigating the factor and 50% or more reporting a statistically significant or important association (see also methods, data extraction). These factors were investigated in 35 systematic reviews, and were taken forward to stage 2 of data extraction. The results of this process are described in Table S1. For example, 15 reviews investigated baseline disability as a prognostic factor for pain, function and/or return-to-work/work absence outcomes. Of those 15 reviews, 10 found an association of disability with one of the outcomes, so this prognostic factor was taken forward to stage 2. However, baseline depression was investigated by 17 reviews but only 5 of these reported an association with one of the outcomes of interest. Depression was therefore not taken forward to stage 2.
Details of the 35 reviews included in Stage 2 of the review are presented in Table S3. Seven of the reviews had conducted a meta-analysis, and a further eight had planned a meta-analysis but had decided that it would have been inappropriate given the data retrieved by their review. Most reviews covered more than one of the specified outcomes (pain, functional disability and/or return-to-work/work absence). No further data extracƟon for 65 prognosƟc factors because they did not meet the following criteria: • • More than one systemaƟc review invesƟgaƟng that factor; 50% or more of the reviews found a posiƟve important or staƟsƟcally significant associaƟon of that factor with one of the outcomes of interest Stage 2 25 prognosƟc factors included (evaluated in 35 systemaƟc reviews)

| Risk of bias within reviews
The results of the overall ROBIS risk of bias assessment are given in Table S4 (results for each review) and Figure 2 (results across reviews). Only 10 of the 35 included reviews were assessed as having an overall low risk of bias; eight as having high risk of bias and the majority of reviews (n = 17) were assessed as having an unclear risk.
Domain 2 covers aspects of identification and selection of studies for inclusion in the reviews. This domain in particular was rated as a high risk of bias for a large number of reviews (14 of the 35), which was primarily due to a poor search strategy and inappropriate restrictions on date, publication format or language (e.g. only including English language papers). Thirteen reviews (Campbell et al., 2013;Chou & Shekelle, 2010;da Silva et al., 2017;Hayden et al., 2009Hayden et al., , 2019Kamper et al., 2008;Kent & Keating, 2008;Lakke et al., 2009;Oosterhuis et al., 2019;Steenstra et al., 2017;Verwoerd et al., 2013Verwoerd et al., , 2019Wertli, Eugster, et al., 2014) were judged to have an overall low risk of bias for this domain.
Domain 4 covers aspects of data synthesis and findings. Six reviews (Alhowimel et al., 2018;Balaji et al., 2014;Oosterhuis et al., 2019;Ramond et al., 2011;Rashid et al., 2017;Wertli, Eugster, et al., 2014) were assessed as having a high risk of bias in this domain, and 13 were assessed as having an unclear risk of bias. As with domain 1, only five reviews reported registering their review and it was often difficult to judge the robustness of findings as few reviews (n = 7; Agnello et al., 2010;Hallegraeff et al., 2012;Hayden et al., 2019;Kamper et al., 2008;Kent & Keating, 2008;Valentin et al., 2016; had performed a meta-analysis (several items in this bias domain concern meta-analysis).

Review characteristic N (%)
Included study designs them having high or unclear risk of bias across several domains, but also due to two other elements of assessment in the ROBIS tool: the relevance of the included studies to the review's research question, and whether the reviewers based their findings (largely) on statistical significance, rather than also the strength, relevance and/or consistency of associations. Seven reviews (Agnello et al., 2010;Alhowimel et al., 2018;Chou & Shekelle, 2010;Hayden et al., 2009;Kent & Keating, 2008;Wilhelm et al., 2017) were assessed as not discussing the relevance of their included studies to addressing the research question, and 11 reviews (Alhowimel et al., 2018;Balaji et al., 2014;Hayden et al., 2009;Lakke et al., 2009;Oosterhuis et al., 2019;Ramond et al., 2011;Rashid et al., 2017;Sarrami et al., 2017;Wilhelm et al., 2017) were judged to have (or probably have, based on the available information) based their results on statistical significance only.

| Synthesis of results
A summary of the review findings for each of the prognostic factors, is presented in Table 2. While information from all 35 included reviews is presented, the conclusions for each factor are weighted towards reviews conducted in the last five years (most recent reviews) and those which scored an overall low risk of bias on the ROBIS tool. For seven prognostic categories (disability/activity limitation, mental health; pain intensity; pain severity; coping; expectation of outcome/recovery; and fear-avoidance) there was moderate confidence that the association is robust (based on consistency of evidence across the included reviews for that category, the date the review was published, and the overall risk of bias score). In umbrella reviews, there is a risk of counting the same evidence multiple times, as the same studies may be included in multiple reviews. To try and account for this, we weighted our conclusions on reviews published most recently in line with similar reviews (e.g. Walton, Carroll, et al., 2013]), but acknowledge that this may not have completely resolved the issue of doublecounting. A cross-check of double-counting across the reviews published in the last five years identified 10 studies that were included in two reviews each (out of 80 studies included across the most recent reviews). For the remaining categories, there was only low confidence that the association was robust, due to only weak or conflicting evidence being presented, or evidence only being available from older and/or reviews with an unclear or high risk of bias.
Even for prognostic factors which were judged to have moderate confidence in the results, conflicting evidence was still presented. Indeed, for nearly all prognostic factors identified there was little consistency between the conclusions of reviews. For the prognostic factor baseline pain severity, where the evidence consistently did indicate an association with outcome, the variable quality of the reviews assessing this factor precluded high confidence in the findings.
Several prognostic factors (disability/activity limitation, general health, previous pain, pain intensity, fearavoidance, coping and expectation of outcome/recovery) were investigated by at least 10 reviews. Again, only moderate to low confidence could be ascribed to these factors due to low or variable quality and also age of the reviews investigating each factor. We included systematic reviews published from 2008 onwards, and 12 were published in 2015 or later.

| DISCUSSION
Our umbrella review identified 41 systematic reviews summarizing the evidence for self-reported prognostic factors for pain, functional disability and return-to-work/work absence outcomes in neck or low back pain populations. From these reviews, we identified 25 prognostic factors where at least two systematic reviews had investigated that factor, and at least half of those reviews reported an association between the prognostic factor and at least one of our outcomes of interest. Seven of these factors (disability/activity limitation; mental health; pain intensity; pain severity; coping; expectation of outcome/recovery; and fear-avoidance) were judged as having moderate confidence for robust findings. The included reviews were heterogeneous in terms of populations included, setting, prognostic factors investigated and overall quality.
Pain intensity and pain severity are included as separate prognostic factors in this review. While it is recognized that many review authors, and indeed authors of the studies included within each review, may have used the terms interchangeably, there is some literature that denotes pain severity as an overarching term, of which different dimensions of pain (such as pain intensity, duration, and impact) are then examined (von Baeyer, 2006), and indeed some of the included reviews separately explored the prognostic value of different dimensions of pain (e.g. Kent & Keating, 2008).
Prognostic factors can be either treatment modifiable or non-modifiable. Although non-modifiable factors can help alongside modifiable factors to identify groups of patients who might need more intensive treatment, it is the modifiable factors alone that are potential treatment targets (Hill & Fritz, 2011). Of the factors identified in the present review, multisite pain may be considered a non-modifiable factor, while the remaining are potentially modifiable through treatment. For example, focusing on T A B L E 2 Summary of review findings (35 reviews investigating 25 prognostic factors)

Prognostic factor Reviews investigating the Prognostic factor
Overall risk of bias (ROBIS)

Prognostic factor Reviews investigating the Prognostic factor
Overall risk of bias (ROBIS)
Targeting specific factors in this way means that they can be tested as potential mediators of treatment effect (Hill & Fritz, 2011).

Confidence in conclusions based on all review findings (GRADE)
Consistent (moderate -strong evidence): no association Conflicting/inconsistent, or weak/limited evidence high/moderate/low/very low 1 5 (2 low RoB - Kamper et al., 2008;Kent & Keating, 2008) Moderate confidence -of the 8 reviews reporting an association, 2 were low RoB (1 recent) 0 2 (1 low RoB -Shearer et al., 2020) Low confidence -2 reviews, both limited/conflicting evidence 0 3 (2 low RoB - Kent & Keating, 2008;Verwoerd et al., 2019) Low confidence -of the 3 reviews reporting an association, 1 was low RoB and not recent, plus 3 reviews presenting conflicting/limited evidence 0 1 Low confidence -2 recent reviews and 1 low RoB 1 1 Low confidence -2 reviews (1 low RoB) but others conflicting 0 1 (low RoB - Kent & Keating, 2008) Low confidence -1 review found an association but not recent and unclear RoB 0 1 Low confidence -1 recent review (low RoB) found an association but conflicting evidence also presented 0 2 (1 low RoB -Ashworth et al., 2011) Low confidence -1 recent review (low RoB) found an association but 2 presented conflicting/limited evidence 0 3 (3 low RoB - Ashworth et al., 2011;Kent & Keating, 2008;Verwoerd et al., 2013;Verkerk et al., 2012) Low confidence -of the 3 reviews reporting an association, 1 was recent and low RoB (but conflicting evidence from other reviews) 0 1 (low RoB - Kent & Keating, 2008) Very low confidence -1 review found an association but unclear RoB and not recent The strengths of this umbrella review are the comprehensive, systematic search for reviews, undertaken by an experienced information specialist on the team, and the independent assessment of risk of bias of the included systematic reviews and subsequent grading of evidence.
The synthesis is limited by the heterogeneity of included reviews, which covered a range of settings, populations and prognostic factors. They also varied in quality, with some reviews using inadequate search strategies to locate relevant articles and few reviews being able to conduct a meta-analysis due to heterogeneity amongst their included studies. We used the ROBIS tool to critically appraise the included systematic reviews. This is a well-used tool to assess risk of bias of systematic reviews, but the domain assessing evidence synthesis focuses on meta-analysis, which was rarely conducted in our included reviews. The lack of meta-analyses was often due to this being considered inappropriate by the original review authors, given wide heterogeneity of prognostic factors, outcome measures and analysis methods. This again highlights the impact of heterogeneity when aiming to summarize evidence for prognostic factors in NLBP.
In conclusion, for seven self-reported factors (disability/activity limitation; mental health; pain intensity; pain severity; coping; expectation of outcome/recovery and fear-avoidance) we found were the consistent evidence for their association with outcomes of pain, disability and/ or return-to-work or work absence in NLBP. The available evidence is heterogeneous and while 20 additional prognostic factors were identified, the quality and age of the reviews investigating these factors means only low confidence could be ascribed to these factors. The results of this overview can inform clinical practice by offering evidence-based prognostic factors that may be help identify vulnerable subgroups at increased risk of persistent back or neck pain. Future research can further investigate the impact of using such prognostic information on treatment and referral decisions, patient outcomes and costs of care.