External Validation of the Machine Learning-Based Thermographic Indices for Rheumatoid Arthritis: A Prospective Longitudinal Study

External validation is crucial in developing reliable machine learning models. This study aimed to validate three novel indices—Thermographic Joint Inflammation Score (ThermoJIS), Thermographic Disease Activity Index (ThermoDAI), and Thermographic Disease Activity Index-C-reactive protein (ThermoDAI-CRP)—based on hand thermography and machine learning to assess joint inflammation and disease activity in rheumatoid arthritis (RA) patients. A 12-week prospective observational study was conducted with 77 RA patients recruited from rheumatology departments of three hospitals. During routine care visits, indices were obtained at baseline and week 12 visits using a pre-trained machine learning model. The performance of these indices was assessed cross-sectionally and longitudinally using correlation coefficients, the area under the receiver operating curve (AUROC), sensitivity, specificity, and positive and negative predictive values. ThermoDAI and ThermoDAI-CRP correlated with CDAI, SDAI, and DAS28-CRP cross-sectionally (ρ = 0.81; ρ = 0.83; ρ = 0.78) and longitudinally (ρ = 0.55; ρ = 0.61; ρ = 0.60), all p < 0.001. ThermoDAI and ThermoDAI-CRP also outperformed Patient Global Assessment (PGA) and PGA + C-reactive protein (CRP) in detecting changes in 28-swollen joint counts (SJC28). ThermoJIS had an AUROC of 0.67 (95% CI, 0.58 to 0.76) for detecting patients with swollen joints and effectively identified patients transitioning from SJC28 > 1 at baseline visit to SJC28 ≤ 1 at week 12 visit. These results support the effectiveness of ThermoJIS in assessing joint inflammation, as well as ThermoDAI and ThermoDAI-CRP in evaluating disease activity in RA patients.


Introduction
Rheumatoid arthritis (RA) is a chronic inflammatory disease that causes persistent joint inflammation.RA is characterised by significant pain, loss of joint function, disability, and reduced quality of life.Current therapies, combined with the implementation of treat-to-target and tight control strategies, have been shown to lead to improved outcomes and make remission an achievable goal [1][2][3][4][5].The cardinal signs of inflammation include pain, swelling, heat, and redness.Pain and swelling are typically the primary focus of the physical examination when assessing inflammation.However, recent studies indicate that the warmth caused by synovial inflammation can also be a useful marker of joint inflammation [6][7][8][9][10][11][12][13][14][15][16].
Despite advances in RA assessment, accurate and timely detection of joint inflammation remains a challenge.While current imaging techniques, such as ultrasound and Magnetic Resonance Imaging (MRI), are effective, they can be time-consuming, costly, and require specialised training.Consequently, these methods are often not feasible for routine clinical practice.Therefore, there is a need for a rapid, affordable, and accurate method to assess joint inflammation in patients with RA.
Thermography is a fast, non-invasive imaging technique that uses thermal cameras to capture the intensity of long-wave infrared radiation emitted by bodies, generating pictures of the heat pattern [17][18][19].Therefore, thermography represents a potentially valuable tool for the detection of arthritis [20][21][22].In our previous work, we developed and trained a machine learning-based computational method that analyses thermal images of hands to assess joint inflammation.The resulting machine learning model produces a score named the Thermographic Joint Inflammation Score (ThermoJIS) [23].In addition, we expanded the functionality of ThermoJIS by incorporating the Patient Global Assessment of the disease and acute phase reactants to develop two novel disease activity indices, namely Thermographic Disease Activity Index (ThermoDAI) and Thermographic Disease Activity Index C-reactive protein (ThermoDAI-CRP) [24].Our initial validation with a hold-out dataset (independent from the training and tuning set) yielded promising results, but further external validation across diverse populations and clinical settings is crucial to confirm these results.Moreover, our previous studies were cross-sectional, so longitudinal studies are needed to evaluate sensitivity to change.
The aims of this study were to assess the effectiveness of ThermoJIS, ThermoDAI, and ThermoDAI-CRP on a new, independent prospective dataset and to evaluate their sensitivity to change.

Recruited Patients
The study design was a non-interventional prospective observational study conducted for 12 weeks.The study population comprised 77 RA consecutive patients recruited at outpatient visits to the departments of rheumatology of three hospitals between April and November 2022.Follow-up visits were conducted at week 12.The inclusion criteria were age over 18 years and a diagnosis of RA, as defined by the 2010 ACR/European League Against Rheumatism (EULAR) classification criteria [25].Exclusion criteria were subjects with wounds, infections, or trauma on the dorsal side of the hands and those using bandages, cosmetics, or other substances that could affect the thermal pattern prior to data collection.
Diagnosis and management of the RA were performed according to the standard of care.The study was conducted in accordance with the ethical principles of the Declaration of Helsinki and Good Clinical Practice guidelines.Approval for human participant involvement was obtained from the following ethics committees:

Thermography Tests
All patients underwent thermography of both hands using a Thermal Expert TE-Q1 camera (i3system Inc., Daejeon, Korea) with a 6.8 mm lens.Thermal cameras were connected to a smartphone, and a custom mobile application was developed to acquire the raw thermal images, which corresponded to the intensity of infrared waves (see an image of the mobile application in Figure S1).To replicate real-world conditions, thermography was conducted during outpatient visits without an acclimatisation process or controlled room temperature.Thermal imaging was performed before the physical examination.The dorsal images of both hands were recorded with the fingers spread.No fixed distance between the camera and the hand was required.The researcher was instructed to frame and focus the image adequately to include all the joints in each hand.Subsequently, all the thermal images were analysed to obtain the ThermoJIS values.The results of ThermoJIS were not available to the researchers before the clinical examination.

Clinical and Laboratory Assessments
Clinical assessments were performed and included the number of swollen and tender joints in the standard 28-joint count examination (SJC28 and TJC28, respectively), as well as the Patient Global Assessment (PGA), the Evaluator Global Assessment (EGA) of disease activity based on a visual analogue scale score (VAS 0-10), the Health Assessment Questionnaire for Rheumatoid Arthritis (HAQ-DI) [26,27], and treatment-related data were collected at both the baseline and week 12 follow-up visits.Additionally, laboratory analyses were conducted, including the erythrocyte sedimentation rate (ESR) and C-reactive protein value (CRP).Response to treatment was evaluated with ACR20 and ACR50 response (defined as a 20% and 50%, respectively, reduction in the number of tender and swollen joints, as well as improvement in at least three of the other five ACR components) [28] and EULAR response criteria [29].

Calculation of ThermoJIS, ThermoDAI and ThermoDAI-CRP
ThermoJIS values were calculated using Singularity Biomed's in-house software (v1.1, Singularity Biomed, Sant Cugat del Vallès, Spain) as follows: Initially, the images were processed for contrast enhancement, background removal, and noise reduction.Subsequently, a collection of local thermal features comprising unique inflammation patterns was extracted.Finally, a fixed deterministic machine learning model was used to analyse the features and determine the ThermoJIS value.Appendix A explains the previous development of the machine learning model.A higher ThermoJIS value corresponded to a greater probability of active synovitis.
ThermoDAI and ThermoDAI-CRP were defined as the simple linear sum of the outcome parameters: ThermoJIS, PGA, and for ThermoDAI-CRP, the CRP in mg/dL.Thus, the formulas for ThermoDAI and ThermoDAI were defined as follows: ThermoDAI-CRP = ThermoJIS + PGA + CRP ThermoJIS values and CRP (mg/dL) are limited from 0 to 10.Therefore, ThermoDAI and ThemoDAI-CRP range from 0 to 20 and from 0 to 30, respectively.

Statistical Analysis
Subject characteristics were described using means with standard deviations (SD), medians with interquartile ranges (IQR), or frequencies with proportions, as appropriate.The Shapiro-Wilk test was used to assess the normality of the data distributions.To calculate the statistical significance of differences between baseline and week 12, the paired t-test or Wilcoxon signed rank was used.McNemar's test was also used to detect changes in proportions.The number of participants exceeded the minimum sample size calculations to provide robust results.Increased data in external validations enhances the generalisability and reliability of the findings.Correlations between scores were calculated using either Spearman's rank correlation coefficient or Pearson's correlation coefficient.For binary classifications, the area under the receiver operating characteristic curve (AUROC), contingency table, and related metrics such as positive predictive value (PPV), negative predictive value (NPV), sensitivity, and specificity were used.Cross-sectional analyses were conducted using the aggregate of visits from both baseline visits and week 12 visits.Patient visits with missing data were excluded.The Mann-Whitney U test was used to assess statistical significance in the non-normal distribution of independent samples.Statistical significance was determined with p-values below 0.05 (two-sided).The statistical analysis was performed using Python (v3.11).

Patient Characteristics
Four patients missed the week 12 visit, resulting in a total of 73 patients completing the study.All subjects tolerated the procedure well, and no adverse effects were observed.Table 1 presents the demographic, clinical, laboratory, disease activity, and thermographic characteristics of the patients at both the baseline visit and the week 12 visit.Distributions of ThermoJIS, ThermoDAI, and ThermoDAI-CRP are shown in Figure S2.

Association of ThermoJIS with Swollen Joint Count
ThermoJIS was evaluated for its ability to detect swollen joints defined as SJC28 > 1, using the aggregate of both baseline and week 12 visits and had an AUROC of 0.67 (95% CI, 0.58 to 0.76; p < 0.001) (Figure 1a).The Spearman's correlation between ThermoJIS and SJC28 was 0.30 (p < 0.001) (Figure 1b).Using a cutoff threshold of 3.56, decided a priori as the optimal threshold with good sensitivity [23], ThermoJIS showed a sensitivity of 81%, specificity of 41%, positive predictive value (PPV) of 43%, and negative predictive value (NPV) of 80%.

Sensitive Detection of Synovitis
In order to indirectly evaluate whether ThermoJIS might be detecting synovitis that is not evident through swollen joint count, the distributions of ThermoJIS were compared using the aggregate of both at baseline and week 12 visits in two groups of patients: those with SJC28 = 0 and PGA > 5, and those with SJC28 = 0 and PGA ≤ 5 (Figure 2).The results show that ThermoJIS values are significantly higher in patients with PGA > 5 compared with those with PGA ≤ 5. Furthermore, a Spearman's correlation coefficient of 0.30 (p = 0.03) is observed between ThermoJIS and PGA in patients with SJC28 ≤ 1.

Sensitive Detection of Synovitis
In order to indirectly evaluate whether ThermoJIS might be detecting synovitis that is not evident through swollen joint count, the distributions of ThermoJIS were compared using the aggregate of both at baseline and week 12 visits in two groups of patients: those with SJC28 = 0 and PGA > 5, and those with SJC28 = 0 and PGA ≤ 5 (Figure 2).The results show that ThermoJIS values are significantly higher in patients with PGA > 5 compared with those with PGA ≤ 5. Furthermore, a Spearman's correlation coefficient of 0.30 (p = 0.03) is observed between ThermoJIS and PGA in patients with SJC28 ≤ 1.

Sensitive Detection of Synovitis
In order to indirectly evaluate whether ThermoJIS might be detecting synovitis that is not evident through swollen joint count, the distributions of ThermoJIS were compared using the aggregate of both at baseline and week 12 visits in two groups of patients: those with SJC28 = 0 and PGA > 5, and those with SJC28 = 0 and PGA ≤ 5 (Figure 2).The results show that ThermoJIS values are significantly higher in patients with PGA > 5 compared with those with PGA ≤ 5. Furthermore, a Spearman's correlation coefficient of 0.30 (p = 0.03) is observed between ThermoJIS and PGA in patients with SJC28 ≤ 1.

Association of the Change of ThermoJIS with the Change of Swollen Joint Count
The Spearman's correlation between the difference of ThermoJIS at baseline and the week 12 visit and the difference in SJC28 is 0.18 (p = 0.125), indicating a weak and nonsignificant correlation.However, the difference in ThermoJIS is found to have an AUROC of 0.71 (95% CI, 0.52 to 0.90, p = 0.053) for detecting when a patient transitions from SJC28 > 1 to SJC28 ≤ 1 (Figure 3).The number of patients with SJC28 > 1 at baseline was 31.

Association of the Change of ThermoJIS with the Change of Swollen Joint Count
The Spearman's correlation between the difference of ThermoJIS at baseline and the week 12 visit and the difference in SJC28 is 0.18 (p = 0.125), indicating a weak and nonsignificant correlation.However, the difference in ThermoJIS is found to have an AUROC of 0.71 (95% CI, 0.52 to 0.90, p = 0.053) for detecting when a patient transitions from SJC28 > 1 to SJC28 ≤ 1 (Figure 3).The number of patients with SJC28 > 1 at baseline was 31.

Association of the Change of ThermoJIS with the Change of Swollen Joint Count
The Spearman's correlation between the difference of ThermoJIS at baseline and the week 12 visit and the difference in SJC28 is 0.18 (p = 0.125), indicating a weak and nonsignificant correlation.However, the difference in ThermoJIS is found to have an AUROC of 0.71 (95% CI, 0.52 to 0.90, p = 0.053) for detecting when a patient transitions from SJC28 > 1 to SJC28 ≤ 1 (Figure 3).The number of patients with SJC28 > 1 at baseline was 31.

Discussion
External validation is an essential step in the development of any machine learning model before it is clinically applied, as it ensures that study findings are reproducible across different populations.In this study, we evaluated the performance of ThermoJIS, ThermoDAI, and ThermoDAI-CRP [23,24] in a new longitudinal cohort of RA patients from three hospitals.
ThermoJIS was originally trained to detect active synovitis in the hands using ultrasound as the gold standard technique [23].Because this study used SJC28 instead of ultrasound as the criterion for joint inflammation, which is less sensitive and accurate, lower performance metrics than previous ultrasound-based studies were expected [30][31][32][33][34][35].Additionally, SJC28 includes joints outside the hands that are not evaluated by ThermoJIS, such as shoulders, elbows, and knees [36].Despite this, the results show the ability of ThermoJIS to detect joint inflammation.Furthermore, even though the correlation between changes in ThermoJIS and changes in SJC28 was weak, ThermoJIS was able to detect patients with swollen joints at the first visit who became inflammation-free at the later visit.These findings suggest the robustness of ThermoJIS and provide evidence of its validity.However, further research is needed to fully evaluate its performance using ultrasound as a criterion of validity.
In our previous study, ThermoJIS was significantly higher if active synovitis was detected using ultrasonography in clinical remission patients, regardless of the definition used, showing that ThermoJIS was sensitive enough for detecting subclinical synovitis [23].
In the present study, patients without swollen joints might still have joint inflammation.To examine this, we compared patients with SJC28 of 0 and PGA of 5 or less with patients with SJC28 of 0 and PGA greater than 5, hypothesising that PGA may indicate underlying inflammation.As expected, in the group of patients with SJC28 of 0 and PGA greater than 5, ThermoJIS levels were significantly higher.These findings suggest that, in certain cases, the false positives may actually be true positives, and that could be a cause of reduced performance metrics.Moreover, these results suggest that in some cases, ThermoJIS was detecting inflammation that the rheumatologist was not able to detect in the physical examination (subclinical synovitis), consolidating the evidence that this new technique could be useful in routine clinical practice to help the rheumatologist in the detection of joint inflammation.Subclinical synovitis has been associated with a higher risk of flares, progression of structural damage, and unsuccessful drug tapering, especially when Doppler activity is present [37,38].Ultrasonography and Magnetic Resonance Imaging are sensitive methods for evaluating clinical and subclinical synovitis in RA, although routine use of these techniques is not feasible for most outpatient visits [39,40].Evaluating synovitis with ThermoJIS is non-invasive, instantaneous, and automatic, making this tool accessible and suitable for routine use in clinical practice.
The thermographic disease activity indices, ThermoDAI and ThermoDAI-CRP, showed strong and consistent correlations with commonly used disease activity indices.Notably, the correlations between changes (from baseline to the week 12 visit) in ThermoDAI and ThermoDAI-CRP versus changes in SJC28 were found to be stronger than those between PGA and PGA + CRP versus SJC28, indicating their utility to detect changes in joint inflam-mation and disease activity over time when physical examination is not possible, even at the patient's home, enabling remote assessment of RA patients.Furthermore, these indices were able to detect patients who met ACR20, ACR50, and EULAR treatment response criteria.This result suggests that ThermoDAI and ThermoDAI-CRP are promising new indices suitable for continuously monitoring patients in remote clinical trials, providing more information about their response to new treatments.
This study is subject to limitations.Although we have externally validated the pretrained machine learning model in a longitudinal cohort, a larger cohort using hand ultrasound as a reference would elucidate the actual performance of the technique, particularly considering the limited sensitivity of physical examination and the fact that the 28 joints examined include some that are outside the hands and cannot be evaluated with thermography.Additionally, the imbalance in the recruited population introduces a potential limitation, as only 35% of patients in our cohort exhibited inflammation compared with 53% in the previous study.This difference may be partially attributed to the inclusion of subclinical cases.Furthermore, in this study, the average changes in TJC28, SJC28, and ThermoJIS between the baseline visit and the 12-week visit were relatively small, whereas there was a significant decrease in disease activity.Thus, it is likely that correlations between the change in ThermoDAI and ThermoDAI-CRP with the change in CDAI and SDAI were mainly driven by the change in PGA and CRP levels.To address this, future studies should aim to include a more balanced cohort of patients with a higher change in TJC28 and SJC28 between visits.Therefore, future research and development should include larger and more diverse cohorts of patients to ensure the generalisability and reliability of the performance outcomes.In addition, cost-effectiveness and impact studies should be conducted to evaluate the integration of thermography into routine clinical workflows.

Conclusions
The results of this external validation support the effectiveness of the Thermographic Joint Inflammation Score (ThermoJIS) in detecting joint inflammation in RA patients.In addition, both the Thermographic Disease Activity Index (ThermoDAI) and Thermographic Disease Activity Index C-reactive protein (ThermoDAI-CRP) have shown construct validity and sensitivity to change, indicating their usefulness in assessing the disease activity and treatment response in patients with RA.
Thermography is a rapid, portable, non-invasive, and affordable imaging technique that provides significant advantages for routine use.Moreover, the integration of machine learning allows for automatic and objective results without requiring extensive training, improving accessibility.These results provide encouraging evidence for the use of these novel indices as a complementary tool in routine clinical practice and in clinical trials for the evaluation of RA patients.

Figure 2 .
Figure 2. Indirect analysis of the Thermographic Joint Inflammation Score (ThermoJIS) performance in detecting synovitis that is not evident through swollen joint count by comparing the ThermoJIS values of patients with 28-swollen joint counts (SJC28) = 0 and Patient Global Assessment (PGA) >5 (n = 11) versus patients with SJC28 = 0 and PGA ≤ 5 (n = 62).The Mann-Whitney U test was used to assess statistical significance.

Figure 2 .
Figure 2. Indirect analysis of the Thermographic Joint Inflammation Score (ThermoJIS) performance in detecting synovitis that is not evident through swollen joint count by comparing the ThermoJIS values of patients with 28-swollen joint counts (SJC28) = 0 and Patient Global Assessment (PGA) >5 (n = 11) versus patients with SJC28 = 0 and PGA ≤ 5 (n = 62).The Mann-Whitney U test was used to assess statistical significance.

Figure 2 .
Figure 2. Indirect analysis of the Thermographic Joint Inflammation Score (ThermoJIS) performance in detecting synovitis that is not evident through swollen joint count by comparing the ThermoJIS values of patients with 28-swollen joint counts (SJC28) = 0 and Patient Global Assessment (PGA) >5 (n = 11) versus patients with SJC28 = 0 and PGA ≤ 5 (n = 62).The Mann-Whitney U test was used to assess statistical significance.

Figure 7 .
Figure 7. Detection of patients who met the American College of Rheumatology 20% response (ACR20), the American College of Rheumatology 50% response (ACR50), and European Alliance of Associations for Rheumatology (EULAR) treatment response criteria using the Thermographic Disease Activity Index (ThermoDAI) and Thermographic Disease Activity Index-C-reactive protein