Baseline patient reported outcomes are more consistent predictors of long-term functional disability than laboratory, imaging or joint count data in patients with early inflammatory arthritis: A systematic review

Objective To assess baseline predictors of long-term functional disability in patients with inflammatory arthritis (IA). Methods We conducted a systematic review of the literature from 1990 to 2017 using MEDLINE and EMBASE. Studies were included if (i) they were prospective observational studies, (ii) all patients had IA with symptom duration ≤2 years at baseline, (iii) follow-up was at least 5 years, and (iv) baseline predictors of HAQ score at long-term follow-up (i.e., ≥5 years following baseline) were assessed. Information on the included studies and estimates of the association between baseline variables and long-term HAQ scores were extracted from the full manuscripts. Results Of 1037 abstracts identified by the search strategy, 37 met the inclusion/exclusion criteria and were included in the review. Older age at baseline and female gender were reported to be associated with higher long-term HAQ scores in the majority of studies assessing these relationships, as were higher baseline HAQ and greater pain scores (total patients included in analyses reporting significant associations/total number of patients analysed: age 9.8k/10.7k (91.6%); gender 9.9k/11.3k (87.4%); HAQ 4.0k/4.0k (99.0%); pain 2.8k/2.9k (93.6%)). Tender joint count, erythrocyte sedimentation rate (ESR) and DAS28 were also reported to predict long-term HAQ score; other disease activity measures were less consistent (tender joints 2.1k/2.5k (84.5%); erythrocyte sedimentation rate 1.6k/2.2k (72.3%); DAS28 888/1.1k (79.2%); swollen joints 684/2.6k (26.6%); C-reactive protein 279/510 (54.7%)). Rheumatoid factor (RF) and erosions were not useful predictors (RF 546/4.6k (11.9%); erosions 191/2.7k (7.0%)), whereas the results for anti-citrullinated protein antibody positivity were equivocal (ACPA 2.0k/3.8k (52.9%)). Conclusions Baseline age, gender, HAQ and pain scores are associated with long-term disability and knowledge of these may aid the assessment of prognosis.


Introduction
Inflammatory arthritis (IA), and its subset rheumatoid arthritis (RA), are chronic conditions characterised by synovial joint inflammation [1]. Negative outcomes associated with these conditions include premature mortality [2,3], joint destruction [4,5], and functional disability [6][7][8]. The term functional disability refers to the difficulties patients with IA have in performing everyday tasks. Preventing or minimising functional disability is a key goal in IA management.
In the past, functional disability was assessed using the Steinbrocker Functional Class system, in which the physician scored the Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/semarthrit patient from class 1 (indicating little or no disability), to class 4 (indicating patients were bed-ridden or confined to a wheel chair) [9]. Whilst this system was quick and reflected clinicians' judgement, only having four levels of disability meant the measure was insensitive to change [10]. Later the Health Assessment Questionnaire -Disability Index (HAQ) was developed [11]. The HAQ comprises 20 questions in eight subsections assessing different aspects of everyday life, yielding a score of 0-3, with 0 indicating no disability and 3 representing substantial levels of disability. The HAQ has become the gold standard for measuring disability in patients with IA and has been shown to be a valid measure of disability [12,13]. A minimum clinically important difference was estimated to be between 0.20 and 0.22 [14], although later estimates have put the value as low as 0.09 within an observational cohort setting [15].
Longitudinally, functional disability measured using the HAQ has been shown to follow a J-shaped trajectory, with initial improvements in disability one to two years following symptom onset, followed by increasing HAQ scores over the subsequent 5-10 years [6]. Being able to predict which patients are likely to develop major problems in performing daily tasks is useful for patients and clinicians. Clinicians can target patients susceptible to high levels of long-term disability to receive additional interventions alongside their pharmacological therapy. Patients too may be able to modify their lifestyle to reduce future disability. A systematic review of predictors of HAQ score in patients with RA was published in 2003 [16]. A further literature review was published in 2010 including studies with patients with a range of disease durations at baseline ( o 1 to 12 years) and follow-up lengths (1-15 years) [17]. However, the latter was not a systematic review and since 2003 a number of additional manuscripts investigating predictors of functional disability have been published. Furthermore, due to the J-shaped trajectory of functional disability, baseline predictors of short term (i.e., between 0 and 5 years) HAQ score may not be the same as predictors of longterm (i.e., ≥5 years) HAQ. Therefore, it is important to consider predictors of long-term functional disability separately from predictors of short-term functional disability, as measured by the HAQ.
The aim of this systematic review was to critically evaluate the available literature on baseline predictors of long-term (i.e., ≥5 years) functional disability in patients with early IA.

Methods
To address these aims, we performed a systematic review using the MEDLINE and EMBASE databases, including studies published between 01/01/1990 and 05/10/2017. The inclusion criteria were (i) all patients had IA (≥2 swollen joints lasting for ≥4 weeks), RA (defined as meeting any of the published criteria sets [18][19][20]), or undifferentiated arthritis; (ii) all patients had less than or equal to two years symptom duration at baseline; (iii) analysis had to assess baseline predictors of long-term functional disability measured using the HAQ at ≥5 years following baseline; (iv) studies had to be observational; (v) studies published in English (or a translation available). Exclusion criteria were (i) randomised controlled trials, clinical trials, cross-sectional studies or case-series; (ii) studies including children; (iii) studies including non-human animals; (iv) conference abstracts. The study was designed and reported according to PRISMA guidelines [21].
A search strategy was devised which included both text words and MESH terms (Supplementary file 1). This search strategy yielded 1037 titles and abstracts, 532 from MEDLINE and 505 from EMBASE. Of these 263 were identified as duplicates by reference managing software (Endnote) and were removed.
Each of the remaining titles and abstracts was independently screened based on the inclusion and exclusion criteria by two reviewers using a standardized form (JG and CS). In case of any discrepancies in agreement between the two reviewers (n ¼ 53) a third reviewer was consulted (SV). Of 774 titles and abstracts screened, 73 met the inclusion criteria and the full manuscript was read by the same reviewers. Of these, 33 papers were included in the review. The reference lists of these manuscripts were screened. Four additional studies were added to the review, meaning a total of 37 studies were included (Fig. 1).

Quality assessment
Two reviewers assessed the conduct and reporting of each study using a system adapted from Pasma et al. [22]. Details on the methods and results of the quality assessment can be found in Supplementary file 2.

Data abstraction
A data abstraction form was created to extract and summarise information from each included study (see data abstraction form in Supplementary file 3), including: number of patients in each study, the length of follow-up, age, gender, baseline and followup HAQ scores and information on analyses carried out assessing the association between baseline predictors and follow-up HAQ score. The predictors of long-term HAQ score were grouped into five categories and presented in tables: demographics, patient reported outcomes, disease activity, autoantibody status and miscellaneous. Each of these tables (i.e., other than Tables 1 and 2) displays results from studies that performed multivariable analyses first, followed by studies that only performed univariable analyses. Within these subsections the studies were sorted by sample size. The statistical method of each analysis is reported, followed by effect sizes with 95% confidence intervals.

Results
A summary of the included studies (N ¼ 37), including demographics, follow-up lengths and baseline and final followup HAQ scores is presented in Table 1. The studies are presented in alphabetical order of first author to aid cross-reference between tables. Sample sizes ranged from n ¼ 25 [23] to n ¼ 3666 [24], and follow-up duration from 5 to 20 years (median (IQR) ¼ 6 (5, 10) years). The median age of the patients ranged from 39.1 [25] to 55.6 years [26] (median ¼ 53 years; 27/37 studies reported median age for the entire cohort). The proportions of women ranged from 62% [27,28] to 100% [29] (median ¼ 66%; 33/37 studies reported the proportion of women). Table 2 summarises the results for each of the predictors assessed in the review.

Assessment of baseline predictors Demographics
The majority of studies assessing the association between age and long-term HAQ score reported that older age at symptom onset was associated with higher HAQ scores at long-term followup (18 studies total, 13 (72%) reported a significant association including 11 multivariable analyses) ( Table 3); 10.7k patients were included, of which 9.8k were included in analyses that reported a significant association (91.6%). The largest study (N ¼ 3666) assessed the association between age and higher HAQ scores over 15 years. The HAQ scores of men aged between 55 and 74 years were, on average, 0.19 (95% CI: -0.01, 0.39) higher and those of men aged ≥75 years were, on average, 1.81 (95% CI: 1.25, 2.36) higher than those men o55 years of age. Older women also had higher HAQ scores compared to younger women, but to a lesser degree (mean difference (95% CI): o55 years ¼ ref, 55-74 ¼ 0.26 (0.12 to 0.40), ≥75 ¼ 0.51 (0.05, 0.98)) [24].
A majority of studies investigating the association between gender and later HAQ scores reported that women had significantly higher long-term HAQ scores than men. In total, 21 studies Reproductive factors/other biomarkers included are heterogeneous and therefore analysis populations were not summed. § Key.
✓✓ ¼ ≥ 85% total participants in studies reporting significant association & ≥2000 total participants studied; ✓ ¼ ≥ 60% & o85% total participants in studies reporting significant association & ≥2000 total participants studied; -¼ ≥40% & o60% or o2000 total participants; ✗ ¼ ≥15% & o40% total participants in studies reporting significant association & ≥2000 total participants studied; ✗✗ ¼ o15% total participants in studies reporting significant association & ≥2000 total participants studied. assessed the association between gender and subsequent HAQ scores. Of these, 11 analyses (9 multivariable) reported that women had significantly higher HAQ scores at long-term followup than men; one multivariable analysis reported that men had significantly higher HAQ scores than women; three multivariable analyses reported a significant association but the direction of the association was unclear (i.e., the coefficient was labelled "gender" and the reference category (men/women) was not clearly reported); six analyses (4 multivariable) reported no significant association between gender and future HAQ score (Table 3). In total, 11.3k patients were included, of which 9.9k were included in analyses that reported a significant association (87.4%). The average difference between the HAQ scores of women and men ranged from 0.08 [30] to 0.38 [31], based on studies reporting a significant association between female gender and higher HAQ score from linear regression analysis. A study by Malm et al. including 1.4k patients followed for 15 years reported that women had a two and a half times increased odds of having a HAQ score over 0.75 at the 15th year assessment compared to men (OR 2.53, 95% CI: 1.85, 3.46) [32].
Patient reported outcomes Nine (7 multivariable) of the 10 studies that investigated the relationship reported a positive association between higher disability at baseline and higher disability at long-term follow-up, whilst the remaining multivariable analysis approached significance ( Table 4). Nine of these reported a positive association between baseline HAQ and follow-up HAQ, whilst one used an alternative measure of baseline functional disability [33]. In total, 4.0k patients were included, of which 3.97k were included in analyses that reported a significant association (99.0%). One study (N ¼ 191) reported that each unit increase in HAQ score at baseline was associated with a 0.39 (p ¼ 0.0001) increase in HAQ score at five years [34]. Another study (N ¼ 1.4k) reported that each unit increase in baseline HAQ was associated with a 3.57 (95% CI: 2.84, 4.49) times increased odds of having HAQ 4 0.75 at 15th year assessment [32]. Four out of six analyses (all multivariable) assessing the relationship reported a positive association between baseline pain visual analogue scale (VAS) scores and follow-up HAQ scores;   (Table 4). In total, 2.9k patients were included, of which 2.8k were included in analyses that reported a significant association (93.6%). One study using generalised estimating equations analysis over eight years of follow-up, reported that each centimetre increase in pain VAS at baseline was associated with an average increase of 0.06 HAQ score over follow-up [35]. Two studies reported a 2% increased odds in being in a higher HAQ category at follow-up per millimetre increase in pain VAS at baseline, one after seven years of follow-up [36], the other after 15 [32].

Disease activity
One multivariable analysis reported a significant positive association between baseline swollen joint count and follow-up HAQ scores [37], whilst four other analyses (two multivariable) reported no association. In total, 2.6k patients were included, of which 684 were included in analyses that reported significant results (26.6%).
Two multivariable analyses reported a significant positive association between baseline tender joint count and follow-up HAQ score, whilst two other multivariable analyses did not report a significant association (Table 5). In total, 2.5k patients were included, of which 2.0k were included in analyses that reported significant results (84.5%).
Furthermore, two small studies (N ¼ 191 and 42) reported a positive association between baseline Ritchie Index (which includes a measure of joint tenderness) and subsequent HAQ scores [27,34]. Thus, the evidence regarding the predictive ability of baseline tender joint counts suggests it may be a useful predictor of long-term HAQ scores, whereas baseline swollen joint count is unlikely to be a predictor of long-term disability.
Two studies (one multivariable) reported that higher C-reactive protein (CRP) level was associated with higher long-term functional disability, whilst two multivariable analyses reported no significant association. In total, 510 patients were included, of which 279 were included in analyses that reported significant results (54.7%).
Two multivariable analyses reported a significant association between higher baseline erythrocyte sedimentation rate (ESR) and higher follow-up HAQ score, although with small effect sizes (HAQ at 15 years 4 0.75: OR 1.01 per unit increase in ESR at baseline, 95% CI: 1.002, 1.012 [32]; mean increase in HAQ score at 5 years: 0.008 per unit increase baseline ESR, p ¼ 0.006 [34]). Four smaller analyses (three multivariable) reported no significant association ( Table 6). In total, 2.2k patients were included, of which 1.6k were included in analyses that reported significant results (72.3%). Thus, there is inconsistent evidence about the relationship between higher CRP and long-term functional disability but ESR is likely to be a weak predictor of HAQ score.
Of the five studies which assessed the association, three univariable analyses and one multivariable analysis reported a positive association between baseline Disease Activity Score (28) (DAS28) and follow-up HAQ scores, whilst one multivariable analysis did not report a significant association (Table 5). In total, 1.1k patients were included, of which 888 were included in analyses that reported a significant association (79.2%). The average increase in HAQ score at follow-up per unit increase in baseline DAS28 ranged from 0.100 [38] to 0.130 [39], based on analyses reporting significant associations from linear regressions.
The largest analysis assessing the association between anticitrullinated protein antibodies (ACPA) positivity and subsequent HAQ scores reported a significant association (N ¼ 1995; mean difference in HAQ between ACPAþ and ACPA-¼ 0.12 (95% CI: 0.02, 0.21)) [40], but five other analyses (two multivariable) found no association (Table 6). In total, 3.8k patients were included, of which 2.0k were included in analyses that reported significant results (52.9%). Thus at present the literature is equivocal as to whether ACPA positivity is a useful predictor of increased longterm functional disability.

Erosions
Four out of five studies (three multivariable) reported no significant association between erosion score at baseline and    (Table 7). One univariable analysis reported a significant correlation [34], but with a low Spearman's rho (ρ ¼ 0.167) indicating a weak relationship. Of the 2.7k patients included in analyses assessing the association, only 191 were included in the analysis that reported a significant association (7.0%). Therefore, based on current evidence, baseline erosions are not a predictor of long-term functional disability in patients with early inflammatory arthritis.

Morning stiffness
All three analyses (one multivariable; total patients included ¼ 424) that assessed the relationship between morning stiffness and long-term HAQ score reported a positive association (Table 7). One study reported a low Spearman's rho (ρ ¼ 0.211) indicating a weak relationship [34], another study reported a 1% increased odds of having a HAQ 4 1 after seven years per minute increase in morning stiffness at baseline compared to no morning stiffness (max ¼ 180; OR 1.008, 95% CI: 1.001, 1.016) [36]. This suggests that baseline morning stiffness may be weakly associated with longterm functional disability, but all the studies reporting on this relationship were relatively small (N o 200).

Genetic factors
Eight analyses (four multivariable) assessed the association between RA susceptibility genes (HLA and PTPN22 variants) and long-term functional disability (Table 7). One large study reported significant associations between different amino acids at positions 11, 71 and 74 of HLA-DRB1 and small increases or decreases in disability over five years [41]. Of the other studies, seven studies examined different HLA regions as the independent variable and one study examined PTPN22 variants [42], all reporting no significant associations. Therefore, the published literature suggests that specific amino acids at different positions of the HLA-DRB1 gene are weakly associated with long-term disability. Other genes within the HLA region do not predict long-term functional disability, with little research on other genetic regions.

Other factors
The one multivariable analysis (N ¼ 1.6k) which assessed body mass index (BMI) as a predictor of long-term HAQ reported that each unit increase in BMI at baseline was associated with a 0.2 increase in HAQ at 15 years follow-up (Table 7) [26].
The one univariable analysis (N ¼ 1.4k) which assessed whether immigrant status predicted long-term HAQ score reported that immigrants to Sweden had significantly higher HAQ scores after 15 years compared to non-immigrants (Table 7) [43]. Bansback et al. reported that those in the highest category (i.e., most deprived) of the Carstairs Deprivation Index, a measure of socioeconomic status, at baseline had an almost two-fold increased adjusted odds of having a HAQ 4 1.5 after five years, compared to those in the lowest category (OR 1.984, p ¼ 0.044, N ¼ 985) [44]. Eberhardt et al. (N ¼ 63) reported that compared to those with 0-9 years of education, patients who had 10-11 years had a 13% lower odds of having HAQ score 41.0 at five years and those with ≥12 years of education had 26% lower odds, after adjusting for confounders [28].
Two studies reported on reproductive factors. One reported a significant association between being parous at baseline vs. nulliparous and subsequent lower HAQ score over 15 years of follow-up (N ¼ 1.9k) [29], and the other reported that women who were postmenopausal at baseline had significantly higher HAQ scores six years later than women who were premenopausal (N ¼ 332) [45]. However, the latter study did not control for age.
Four studies examined the association between other biomarkers and subsequent HAQ scores (Table 7). No association was found between antifilaggrin antibody, antiperinuclear factor, antikeratin or anticollagen type II antibody status and subsequent HAQ scores [23,46,47]. However, anti-carbamylated protein antibody positivity and being in the highest tertile of sE-selectin level were associated with higher long-term HAQ score [40,48].

Discussion
This systematic review identified 37 studies that assessed the association between a total of 20 baseline variables and    [38] 273 Multiple regression ✗ (HLA-DR4) HLA-DR4 þ vs. HLA-DR4-: Age, gender, RF, DAS, ACPA b o 0.001, p ¼ NS Lindqvist [54] 183 Stepwise logistic regression (HAQ cutoff ¼ 1. subsequent long-term functional disability, as measured by the HAQ, in patients with inflammatory arthritis. There was highly consistent evidence of an association between female gender, higher baseline age and higher baseline HAQ score, with subsequent higher HAQ scores. There was moderately consistent evidence of an association between higher baseline pain, DAS28 and morning stiffness and subsequent increased HAQ score. However in general, studies reported weak or no association between higher baseline swollen joint count, erosions, HLA genetic variations or RF positivity with later HAQ scores. The literature is equivocal regarding the relationship between ACPA positivity and subsequent HAQ scores. The findings of this review are in agreement with a review carried out by Scott et al. in 2003, which reported that women and those of older age at baseline were more likely to have high disability in the future [16]. Scott et al. also found that higher pain at baseline was associated with higher subsequent disability. However, Scott et al. reported that RF positivity and a high number of erosions were associated with increased disability at follow-up. The association between more erosions and subsequent higher HAQ score was also reported in a review by Bombardier et al. [59]. This is likely to be because both of these previous reviews included patients with any disease duration, whilst the current review only included studies confined to early arthritis patients (symptom duration ≤2 years at baseline) who may not yet have developed erosions.
Baseline HAQ score was the only variable that was shown to be associated with higher HAQ score at follow-up consistently across all studies assessing the relationship (with nine studies reporting a significant association and one study trending towards significance). Higher levels of pain and morning stiffness at baseline may also be useful predictors of subsequent higher HAQ score, although the evidence for this is weaker. Furthermore, four out of five studies assessing the relationship reported a significant relationship between baseline DAS28 and later functional disability. However, the longest follow-up of these studies was six years.
Also of clinical interest are the results of studies assessing the association between RF and ACPA positivity and later HAQ scores. None of the three large cohort studies with over 1000 patients at baseline reported a significant association between RF positivity and later higher HAQ scores and only one study out of six reported a significant association between ACPA positivity at baseline and later higher HAQ scores. However, this was by far the largest study to assess the association, including almost 2000 patients in the analysis [40].
This review has a number of strengths. Limiting the review to studies of patients with early arthritis allows us to examine which factors early in the disease process predict later functional  [28] 63 Logistic regression (HAQ cut-off ¼ 1.0); education groups (Years): 0-9, 10-11, ≥12 ✓ (Years of education) Per education group change: "[demographic,] clinical, radiographic and laboratory data" OR 0.87 (p ¼ 0.05) See Table 2 for acronym definitions: ACPA, BMI, CRP, DAS28, ESR, HAQ, RF, SES. See Table 3 for acronym definitions: b, NS, OR, RA. See Table 4 for acronym definitions: ρ.
disability. Furthermore, we have stratified the presentation of results into multivariable and univariable analyses, and then sorted within these sets based on the sample size of the studies. Therefore analyses with high power which control for confounding are presented first, allowing the reader to easily assess the quality of the studies presented. A drawback to this review is that a meta-analysis could not be performed due to the heterogeneity between the studies. Almost every study assessed the association between baseline variables and subsequent HAQ scores in a different way, using different analysis techniques and controlling for different combinations of covariates. Any meta-analysis combining these studies would be uninterpretable. Furthermore, we have included all studies published since 1990 that met the inclusion criteria. Thus secular trends in disease severity could be influencing the results of the review [60,61] or differences in the available treatments and treatment strategies over time may mean that studies published over this period are not comparable.
The majority of studies included within the review were judged to be of moderate quality (Supplementary File 2). Studies often did not report on the amount of missing data. Other studies used complete case analyses, which could mean that the results of the studies are biased. Furthermore, studies often included covariates in analyses but only reported on the primary predictor defined in the research question. Therefore, these covariates could not be included in the review, despite contributing to the analyses.
In conclusion, this review has demonstrated that female gender and higher baseline age, HAQ score, pain score and duration of morning stiffness have been consistently reported to predict longterm increased functional disability. Furthermore, most studies assessing the association reported no association between RF and erosion and early IA patients' long-term disability. This study indicates the relative importance of patient reported outcomes over blood test results in predicting the long-term prognosis (in terms of physical disability) of patients with IA.