Disease Activity Indices in Rheumatoid Arthritis: Comparative Performance to Detect Changes in Function, IL-6 Levels, and Radiographic Progression

Objective: To compare the capacity of various disease activity indices to evaluate changes in function, IL-6 levels, and radiographic progression in early and established rheumatoid arthritis (RA). Methods: Secondary data analysis of a clinical trial assessing the efficacy of tocilizumab in patients with established RA (ACT-RAY) and a longitudinal prospective register of early arthritis (PEARL). Targeted outcomes were changes in physical function, measured with the health assessment questionnaire (HAQ), IL-6 serum levels, and radiographic progression. The “Hospital Universitario La Princesa Index” (HUPI), DAS28 using erythrocyte sedimentation rate and SDAI were the disease activity indices compared. Models adjusted for age and sex were fitted for each outcome and index and ranked based on the R2 parameter and the quasi-likelihood under the independence model criterion. Results: Data from 8,090 visits (550 patients) from ACT-RAY and 775 visits (534 patients) from PEARL were analyzed. The best performing models for HAQ were the HUPI (R2 = 0.351) and SDAI ones (R2 = 0.329). For serum IL-6 levels, the SDAI (R2 = 0.208) followed by the HUPI model (R2 = 0.205). For radiographic progression in ACT-RAY, the HUPI (R2 = 0.034) and the DAS28 models (R2 = 0.026) performed best whereas the DAS28 (R2 = 0.030) and HUPI models (R2 = 0.023) did so in PEARL. Conclusions: HUPI outperformed other indices identifying changes in HAQ and radiographic progression and performed similarly to SDAI for IL-6 serum levels.


INTRODUCTION
Routine management of rheumatoid arthritis (RA) using the treat-to-target (1) and tight-control (2) strategies require validated tools to measure disease activity. The most frequently used measures in randomized clinical trials (RCT) are the disease activity score of 28 joints (DAS28) (3), the simplified disease activity index (SDAI) (4), and the clinical disease activity index (CDAI) (5). Although extensively validated, these indices exhibit some limitations. Different cohorts have shown that DAS28 and SDAI may be sex-biased, as they include a pain rating and erythrosedimentation rate (ESR), both usually higher in women. This potential bias could lead rheumatologists to over-treat women with RA (6,7).
To overcome these limitations, the "Hospital Universitario La Princesa Index" (HUPI), was developed and validated (8)(9)(10). HUPI includes the same variables as DAS28 but its calculation can be done either with ESR, C-Reactive Protein (CRP), or both, depending on their availability, as a way to tackle missing data (8). This index developed and validated in an early arthritis cohort (8), has disease activity cut-offs with higher areas under the curve in comparison to DAS28, SDAI, and CDAI (9). HUPI's responsiveness was evaluated against the other disease activity indices in three different scenarios, namely an RCT and two different RA cohorts, including patients with early and established disease. The responsiveness was similar to that of DAS28-CRP and better than the remaining indices with response criteria that are more stringent than those of EULAR (10). Based on its psychometric properties, the 2019 update of the American College of Rheumatology recommended RA disease activity measures included HUPI among the indices that fulfil minimum standards for regular use in most clinical settings (11).
Nowadays, the importance of an early diagnosis and treatment in patients with RA is well-established (12,13). However, to offer patients a tailored therapy aimed at improving efficacy and reducing side effects, we need reliable measures of what is happening now (assessment) and what will happen in the future (prediction). Accordingly, we hypothesized that HUPI's performance to identify unbiased changes in disease activity makes it more suitable to assess changes in relevant outcomes and surrogates of inflammation (14,15). The objective of this study was to compare the capacity of HUPI and other indices to identify changes in (i) physical function, (ii) serum levels of interleukin-6 (IL-6), and (iii) radiographic progression in patients with early and established RA.

METHODS
This study is a secondary data analysis of an early arthritis cohort and an RCT in established RA.

Study Population
The ACT-Ray Trial The main characteristics of the ACT-RAY trial have been previously reported (16). In summary, this is a 3-year doubleblind RCT designed to evaluate the efficacy and safety of tocilizumab (TCZ) plus methotrexate vs. TCZ monotherapy in patients with established RA with inadequate response to methotrexate. The study included patients fulfilling the ACR 1987 criteria with a DAS28 > 4.4 and erosive disease. Data on demographics, disease activity variables, and laboratory data were collected every 4 weeks from baseline until the end of the study. Since there were no statistically significant differences in clinical response between arms, we included all patients' data regardless of the allocation group up to week 52 when, according to the protocol, patients in sustained remission discontinued treatment with TCZ (16).

The PEARL Cohort
This prospective cohort has been previously described (9). In summary, PEARL includes incident cases of early arthritis, with one or more swollen joints for less than a year. Patients are referred by their treating rheumatologist to an early arthritis clinic, in which patients undergo 5 visits (at baseline, 26, 52, 104, and 260 weeks) per protocol performed by the same two rheumatologists, which guarantees consistency in clinical examination, particularly joint counts.
Demographics, disease activity measures, and radiological data are routinely recorded in standardized forms. In addition, biological samples are systematically collected. Patients are treated according to their treating rheumatologist's criteria.
For the present study, we included patients either meeting the 1987 ACR criteria for RA (17) or classified as having UA (18) at the 24-month follow-up visit, from cohort inception (2000) until June 2019.

Variables
Physical function: It was measured through the Health Assessment Questionnaire-Disability Index (HAQ) in both datasets. This self-reported questionnaire was administered at every follow-up visit using cross-cultural validated versions (19,20).
Serum IL-6 Levels (pg/ml) IL-6 had been previously measured in frozen serum samples from PEARL patients using an enzyme-linked immunoassay (Quantikine R HS ELISA, R&D Systems R ) according to the manufacturer's instructions as previously described (21). The biobank of La Princesa University Hospital-Health Research Institute (ISS-IP) provided serum for this previous study. In the present work, we have used these previous serum IL-6 measurements as a surrogate for inflammation in the PEARL study, to analyze their relationship with the different indices studied.
IL-6 was measured as a surrogate for inflammation (14) only in the PEARL study, using an enzyme-linked immunoassay (Quantikine R HS ELISA, R&D Systems R ) according to the manufacturer's instructions.

Radiographic Progression
Plain X-rays were available to measure radiographic progression using the Genant-Sharp score in ACT-RAY (22) and the modified-Sharp-Van der Heijde score (23) (applied only in hands) in PEARL. We analyzed only the of erosions because we consider it more accurate to show changes only due to RA, as opposed to measuring changes in joint space narrowing that have been shown to be strongly associated with age, rather than disease activity (24). The variable of erosions was calculated as the difference in the respective scores between baseline and the 52-week visit for ACT-RAY and the 104-week follow-up visit for PEARL.

Statistical Analysis
Data from each of the two studies were analyzed independently.
Normally distributed variables were represented as mean and standard deviation (SD) and non-normally distributed variables as the median and interquartile range (IQR). Categorical variables were presented as numbers and proportions.
To assess the performance of HUPI, DAS28, and SDAI on explaining changes in the three mentioned outcomes, we developed models for each of them as dependent variable adjusting for known potential confounders, such as age and sex (27). Only patients without missing data in all of these variables were included for analysis. For all models, indices and age were standardized (centered and scaled by subtracting from each variable record the variable mean value and dividing the result by the standard deviation), thereby allowing comparisons.
Models with HAQ as a dependent variable were developed in ACT-RAY and fitted using Generalized Estimating Equations (GEE), nesting visits to each patient. An unstructured variancecovariance matrix for fixed and residual terms was used to avoid assumptions on the variance-covariance structure. Models were ranked according to the R 2 parameter and the quasi-likelihood under the independence model criterion (QIC) (28). The model with the highest R 2 and the lowest QIC was selected as the bestranked one. This ranking was then validated in PEARL using the R 2 parameter.
We used a similar approach to develop models for IL-6 serum levels as the dependent variable. As IL-6 levels were not collected in ACT-RAY, we used 80% of the PEARL population to establish the predicting model and the remaining 20% for its validation. This analysis was done with the R package "geepack" (29).
Finally, the models describing the relationship between erosions and the different indices were developed independently for ACT-RAY and PEARL, because of the previously described differences in their measurement. For these models, we obtained the mean value of each disease activity index for the entire follow-up, rather than the score at every visit, as done in the previous models. These mean values were categorized as follows: remission = 0, low = 1, moderate = 2 and high activity = 3, according to their respective cut-offs (3). Models were ranked by the R 2 parameter (R package stats) and the AIC (Akaike's Information Criterion) (30,31), being the one with the highest R 2 and the lowest AIC selected as the best-ranked model. The relative importance of each predictor was calculated by decomposing the R 2 value of the model into components corresponding to each predictor (R package r2glmm) (32). Linear models were used to analyze HAQ and IL-6 and quadratic ones for radiographic progression due to better data adjustment. Statistical analyses were conducted using R version 3.6.3 (27).

Ethical Considerations
This is a secondary analysis of anonymized data from patients included in the ACT-RAY and PEARL studies. The ACT-RAY trial was approved by the Ethics committees of each participant center (see Acknowledgement section "ACT-RAY group") and the PEARL study was approved by the Ethics Committee for Clinical Research at the Hospital Universitario de La Princesa (PI-518; March 28th, 2011). All patients had signed a written consent form before inclusion. Both studies were conducted according to the principles of the Helsinki Declaration (33).

RESULTS
The  Table 1 and the distribution of HAQ according to HUPI, DAS28, and SDAI in both study populations are shown in Figure 1 and Supplementary Figure 1.

Comparative Analysis of Indices With IL-6 Serum Levels as Outcome
Variations in the IL-6 serum level were initially modeled with a randomly split 80% of the PEARL population. In these initial models, IL-6 levels were better explained when including SDAI or HUPI as predictors, with R 2 of 0.208 (QIC: 289.207) and 0.205 (QIC: 290.823), respectively, in comparison with an R 2 of 0.190; QIC: 295.610 for DAS28 ( Table 2). These results indicate that the former two explained ∼21% of the variance, while the latter explained 19%. The β coefficient for SDAI was 0.363 vs. 0.345 and 0.337 for HUPI and DAS28, respectively. Of note, the R 2 parameters of HUPI and DAS28 remained similar when applied to the validation cohort (the remaining 20% of the PEARL population), as opposed to the SDAI model, which changed from explaining ∼21% in the initial population to 18% in the validation population ( Table 2). It is also noteworthy that sex only reached significance in the DAS28 model. Additional data of the models are presented in Table 2 and the distribution of IL-6 serum levels according to each index scale in Figure 2.

Comparative Analysis of Indices With Radiographic Progression as Outcome
As radiographic progression was evaluated using different methodologies in both studies, we ran separate comparative analyses. As shown in Table 3, when analyzing data from the ACT-RAY study, the model including HUPI as an explanatory variable showed the best performance (R 2 : 0.034; AIC:925.687), followed by the one with DAS28 (R 2 : 0.026; AIC: 928.793) and then the one using SDAI (R 2 : 0.017; AIC:932.347). These results indicate that the HUPI model explained slightly better the variance of erosions (3.4%) than the models including DAS28 and SDAI (2.6 and 1.7%, respectively). When assessing partial R 2 parameters for each explanatory variable, HUPI and HUPI 2 explained 2% of the variance, while DAS28/DAS28 2 explained 1% and SDAI/SDAI 2 0.2%. β coefficients for HUPI and HUPI 2   Table 3.
In contrast, when using data from PEARL, none of the models were associated with radiographic progression. Results were R 2 : 0.030 (0.010-0.150) AIC: 347.520) for DAS28, R 2 : 0.023 (0.008-0.138) AIC: 348.413 for HUPI, and R 2 : 0.018 (0.007-0.131) AIC: 348.955 for SDAI. The model including DAS28 explained ∼3% of the variance, while those with HUPI and SDAI explained 2.3% and 1.8%, respectively. Partial R 2 parameters show that DAS28/DAS28 2 explained 2.5% of the overall variance, while HUPI and SDAI explained 1.7 and 1.3%, respectively. Additional details are shown in Supplementary Table 3. The distribution of the variable erosions in ACT-RAY and PEARL according to the different categories of each index is shown in Figure 3 and Supplementary Figure 2, respectively.

DISCUSSION
In this study, we evaluated the performance of HUPI in comparison to other traditional disease activity indices as explanatory variables for physical function decline measured by HAQ, inflammation, assessed by IL-6 serum levels, and radiographic progression measured by erosions. Our results indicate that HUPI performed well with most outcomes studied, being the best in explaining the decline in physical function and radiographic progression (ACT-RAY) and second-best for IL-6 serum levels. Of note, all indices performed poorly with regard to radiographic progression, mainly because both populations showed modest changes in their respective radiographic scores, as expected for early diagnosed, intensively treated patients.
Even though the models containing HUPI did not outperform their counterparts in all comparisons, they were the most  consistent in the different proposed scenarios. The SDAI models performed best for IL-6 changes, probably because the weight of CRP is high in SDAI but were the last ranked for erosions. Similarly, DAS28 models worked best for explaining erosions in PEARL but rated the worst for HAQ and IL-6.
The association between HAQ and traditional indices (DAS28, SDAI, and CDAI) has been previously analyzed in a study by Aletaha et al. (34) with two observational cohorts, one including patients with established RA, and another with early arthritis. These analyses showed moderate and similar correlations for all indices (r = 0.45-0.47) for the former, and weaker for the latter cohort (r = 0.26-0.31). Another study pooling data from three RCTs showed moderate to good correlations with HAQ for SDAI and CDAI at baseline and after 6 months of follow-up (r = 0.36-0.66) (4). These observations are consistent with our results: SDAI and DAS28 performed quite similarly on HAQ assessment. Nonetheless, our data support a slight superiority of HUPI.
The association between indices and IL-6 levels has also been previously analyzed in a study by Madhok et al. (35) showing a weak correlation (r = 0.3) with the Ritchie Activity Index. In our study, initial models including all three indices performed similarly, with little differences favoring those including SDAI (with R 2 parameters ranging from 0.190 to 0.208). Notably, when validating these models with the 20% remaining data from PEARL, HUPI and DAS28 models performed better than SDAI.
Navarro-Compán et al. (36) summarized the association between disease activity indices and radiographic progression in a systematic review. The majority of studies reported a significant association, especially after adjustment by time. However, this review did not carry out comparative analyses between indices. Aletaha et al. (34) assessed the linear correlation between time-averaged DAS28, SDAI, and CDAI and radiographic progression (measured with the Larsen score) after 3 years of follow-up, and found similar moderate correlations, with r coefficients ranging from 0.54 to 0.59. Of note, in this study, no GEE modeling was carried out. Klarenbeek et al. (37) using 5-year data from the BeST study, found similar results after assessing the association of different indices with radiographic progression, using the Sharp-van der Heijde score, and HAQ. These authors ran GEE models to analyze different scenarios for both outcomes and found that all associations were highly comparable. Despite the limited radiographic progression in ACT-RAY and PEARL, our results are in line with those previously described in the literature, favoring HUPI's performance to explain radiographic progression.
Our study has strengths, such as a study population including patients with both early and established RA, as well as a thorough statistical analysis. Nonetheless, it also presents some limitations, the most important being the low radiographic progression observed in both cohorts, which might have affected the performance of the three disease activity indices. This prevented us from establishing firm conclusions from the comparative analysis. Another limitation is the fact that IL-6 serum levels were only available from PEARL, something that limited the number of visits/patients assessed.
In conclusion, HUPI exhibits a slightly superior performance to identify physical function declines and radiographic progression than DAS28 and SDAI and detects changes in IL-6 serum levels similar to the other indices. This behavior is consistent in early and established RA. These new findings, in addition to the absence of sex bias and the possibility of its calculation either with CRP or ESR, reinforce the role of HUPI for research purposes.

DATA AVAILABILITY STATEMENT
The data analyzed in this study is subject to the following licenses/restrictions: Data from the PEARL study can be requested to the corresponding author. Data from ACT-RAY were provided by Hoffmann-La Roche Ltd through a data-sharing agreement that does not allow for the public sharing of these data. The authors did not enjoy any special access privileges in gaining access to these data. Regarding the possibility that any other researcher would like to request data to replicate the reported study findings, Hoffmann-La Roche Ltd has implemented a Data Sharing policy to align with the ICMJE recommendations: "Qualified researchers may request access to individual patient-level data through the clinical study data request platform (www.clinicalstudydatarequest. com). Further details on Roche's criteria for eligible studies are available in https://clinicalstudydatarequest.com/Study-Sponsors/Study-Sponsors-Roche.aspx. For further details on Roche's Global Policy on the Sharing of Clinical Information and how to request access to related clinical study documents, see https://www.roche.com/research_and_development/who_ we_are_how_we_work/clinical_trials/our_commitment_to_ data_sharing.html". Requests to access these datasets should be directed to ACT-RAY data: www.clinicalstudydatarequest.com. PEARL data: Isidoro Gonzalez-Alvaro, isidoro.ga@ser.

ETHICS STATEMENT
This is a secondary analysis of anonymized data from patients included in the ACT-RAY and PEARL studies. The ACT-RAY trial was approved by the Ethics committees of each participant center and the PEARL study was approved by the Ethics Committee for Clinical Research at the Hospital Universitario de La Princesa (PI-518; March 28th, 2011). Both studies were conducted according to the principles of the Helsinki Declaration. The patients/participants provided their written informed consent to participate.

AUTHOR CONTRIBUTIONS
IG-Á, SR-G, and NM contributed to conception and design of the study and organized the database. NM performed the statistical analysis and wrote a section of the manuscript. SR-G wrote the first draft of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.

FUNDING
This study was funded by the Instituto de Salud Carlos III through the grant PI18/00371 and the RETICS Program RIER: RD16/0012/0004; RD16/0012/0011 (Co-funded by The European Regional Development Fund A way to make Europe). SR-G was funded by the Spanish Rheumatology Foundation (Grants for physician-researchers 2018-2021).