Radiomics-based machine learning model to phenotype hip involvement in ankylosing spondylitis: a pilot study

Objectives Hip involvement is an important reason of disability in patients with ankylosing spondylitis (AS). Unveiling the potential phenotype of hip involvement in AS remains an unmet need to understand its biological mechanisms and improve clinical decision-making. Radiomics, a promising quantitative image analysis method that had been successfully used to describe the phenotype of a wide variety of diseases, while it was less reported in AS. The objective of this study was to investigate the feasibility of radiomics-based approach to profile hip involvement in AS. Methods A total of 167 patients with AS was included. Radiomic features were extracted from pelvis MRI after image preprocessing and feature engineering. Then, we performed unsupervised machine learning method to derive radiomics-based phenotypes. The validation and interpretation of derived phenotypes were conducted from the perspectives of clinical backgrounds and MRI characteristics. The association between derived phenotypes and radiographic outcomes was evaluated by multivariable analysis. Results 1321 robust radiomic features were extracted and four biologically distinct phenotypes were derived. According to patient clinical backgrounds, phenotype I (38, 22.8%) and II (34, 20.4%) were labelled as high-risk while phenotype III (24, 14.4%) and IV (71, 42.5%) were at low risk for hip involvement. Consistently, the high-risk phenotypes were associated with higher prevalence of MRI-detected lesion than the low-risk. Moreover, phenotype I had significant acute inflammation signs than phenotype II, while phenotype IV was enthesitis-predominant. Importantly, the derived phenotypes were highly predictive of radiographic outcomes of patients, as the high-risk phenotypes were 3 times more likely to have radiological hip lesion than the low-risk [27 (58.7%) vs 16 (28.6%); adjusted odds ratio (OR) 2.95 (95% CI 1.10, 7.92)]. Conclusion We confirmed for the first time, the clinical actionability of profiling hip involvement in AS by radiomics method. Four distinct phenotypes of hip involvement in AS were identified and importantly, the high-risk phenotypes could predict structural damage of hip involvement in AS.


Introduction
Ankylosing spondylitis (AS) is a chronic inflammatory disease that primarily involves the spine, sacroiliac joints and peripheral joints, which could potentially lead to significant morbidity and disability (1).Hip involvement is a prevalent manifestation and an important cause of disability in AS.It is also associated with spine damage, function impairment, increased disease burden and poor prognosis in AS (2,3).Magnetic resonance image (MRI) can detect early hip lesion in AS and plays an important role in the diagnosis of hip involvement in AS (4).However, MRI-detected hip lesions like joint effusion, subchondral bone marrow edema (BME) were not AS-specific, they could also appear in a wide spectrum of clinical entities such as osteoarthritis, stress injury, femoral head avascular necrosis, joint infection and inflammatory disorders (5,6).Moreover, it is prone to overestimate the prevalence of hip involvement in AS if we only rely on the present of abnormal MRI lesions (7) and the gold-standard MRI definition of hip involvement in AS is still lacking.Therefore, a new method that accurately predicts hip involvement in AS is urgently needed.
Radiomics has gained increasing attention over the last decade as a promising quantitative image analysis method that had been successfully used in patient phenotyping and prediction of treatment response in a wide variety of diseases (8,9).Generally, radiomic features were firstly extracted from regions of interest (ROIs) in routine images like CT or MRI.Then, the radiomic features containing crucial information about disease were progressed by artificial intelligent techniques like machine learning (ML) or deep learning methods.Radiomics was initiated in oncology studies and extended to musculoskeletal diseases in the last few years (10).Moreover, ML-based deciphering of complex diseases, such as sepsis, heart failure, ARDS and COVID-19 (11)(12)(13)(14), had successfully identified biologically distinct phenotypes and facilitated the understanding of their biological mechanisms.Therefore, we hypothesized that radiomics is a promising method in profiling of hip involvement in AS.We did this pilot study to evaluate the clinical actionability of using radiomics data to phenotype AS patients with symptomatic hip involvement and predict structural damage of hip joint in AS.

Materials and methods
We retrospectively investigated AS patients with hip joint pain and who underwent pelvis MRI exams since January 2019 to September 2022, at the First Medical Center of the Chinese People's Liberation Army (PLA) General Hospital, a tertiary referral center in Beijing.All enrolled patients met the following criteria: they were diagnosed with AS according to the 1984 modified New York criteria (15) and whose MRI imaging fulfilled the quality criteria for reading.Patients with other comorbidities that potentially result in hip joint pain were excluded.Sociodemographic data, type of previous anti-inflammatory medication (non-steroidal anti-inflammatory drugs (NSAIDs) and tumor necrosis factor inhibitors (TNFi)) and clinical assessments were obtained from medical records.Clinical assessments included age at onset, disease duration, peripheral arthritis history, serum inflammatory markers level (C-reactive protein (CRP) and erythrocyte sedimentation rate (ESR)) and HLA-B27 status.Furthermore, X-rays of anterior-posterior pelvis were collected and the severity of structure damage of hip joint was assessed by the Bath ankylosing spondylitis radiology hip index (BASRI-hip) (16).Research ethics approval was granted by the Ethical Committee of the Chinese PLA General Hospital (S2023-375-01) and informed consent was waived due to the retrospective nature of the study.Our works were conducted in accordance with the Declaration of Helsinki.

MRI image acquisition and preprocessing
As the real-world background, patients underwent MRI exams in 8 MRI scanners at our hospital.The parameters of different scanners were detailed in Supplementary Table S1.To correct the heterogeneity of radiomic features caused by different scanners, we used a practical realignment approach, the comBat compensation method (17).This method realigns image-derived data in a single space in which the batch effect is discarded.This method enables pooling data from different scanners and centers without a substantial loss of statistical power caused by intra-and inter-center variability (18,19).Image preprocessing was conducted as a fixed bin size of 25 for image discretization was used to filter noise from images and all images were resampled at the same voxel size (1 × 1 × 1 mm 3 ) to standardize the voxel spacing.A detailed workflow of the steps involved in our study was summarized in Figure 1.

Image evaluation and region segmentation
Conventional MRI characteristics of hip joint were reported by two musculoskeletal radiologists (reader 1 and reader 2).The severity of structure damage of hip joint was also assessed by reader1, according to the BASRI-hip.The presence of joint effusion, BME and enthesitis was considered as active inflammatory changes, whereas sclerosis, subchondral erosion, joint space narrowing and fat lesion were termed as structural damage of hip involvement (7).We defined active inflammatory changes and chronic structural damage with reference to previously reported method (7).Additionally, we used a qualitative method to define these lesions: the presence of a defined lesion in any slice of hip MRI was considered positive for that lesion.A senior radiologist would also be brought into making the final conclusion if there was disagreement between the two observers.Then, a fellowship-trained operator (reader 3) delineated the entire hip joint, composed of the femur, acetabulum, and joint space, as regions of interest (ROI).The reader delineated the ROIs with reference to the range of proximal hip femur, acetabulum and hip joint capsule in slices on an open-source software, 3D Slicer (Version 5.0.3).The ROIs were drawn manually slice by slice in the axial axis, by using edge-based tool and then fine-tuned by the smoothing tool in 3D Slicer (Figure 2).

Radiomic features extraction and selection
Radiomic features were extracted in the open-source radiomics platform, Pyradiomics (version 3.0.1), in Python (version 3.7).Radiomic features were defined according to the Image Biomarkers Standardization Initiative (IBSI) (20) and fell into the following categories: first-order (n=18), shape (n=8) and texture (n=75) features.Moreover, 14 image filters were applied and highorder features (n=1210) were extracted after decompositions of the original images by the filters.A list of all radiomic features and detailed explanation were provided in Supplementary Table S2.Redundancy was checked and radiomic features with invariance were removed.Additionally, to assess the reliability of manual segmentation process, another observer (reader 1) delineated 15 randomly selected patients, after training session and consensus meeting with reader 3.Then, inter-observer (reader 1 and 3) and intra-observer (reader 3 twice) intraclass correlation (ICC) were calculated to evaluate the reliability of extracted radiomic features.Only features with good reproducibility that both inter-observer and intra-observer ICC ≥ 0.75 were considered in further analyses.All selected features were normalized by Z-score standardization before the next step.

Phenotype derivation, validation and interpretation
Once radiomic features were selected and prepared, unsupervised agglomerative hierarchical clustering with Euclidean distance calculation and Ward linkage criterion was applied to identify radiomics-based patient clusters.Dendrogram that visualizes the clustering procedure and distances between the clusters at different layers was prepared to help determine the optimal number of clusters (phenotypes).
The validation of derived phenotypes was conducted in three ways.First, we characterized the derived phenotypes by clinical backgrounds.In detail, we evaluated inter-groups differences of clinical factors associated with hip involvement, such as juvenileonset, disease duration, cigarette smoking, TNFi treatment and serum inflammation markers.Second, we interpreted phenotyping results by profiling the heterogeneity of MRI-detected hip lesions between phenotypes.Third, we assessed the radiographic outcomes of hip involvement by the BASRI-hip criteria, to evaluate the performance of radiomics-based phenotyping to predict hip joint structural damage.

Validation of radiomic-derived phenotypes
To evaluate the robustness and reliability of the phenotypes obtained from unsupervised agglomerative hierarchical clustering, we performed a consensus clustering algorithm using the 'ConsensusClusterPlus' package (version 1.62.0).This method involves conducting multiple iterations of clustering on resampled data and then measuring the consistency of the resulting clusters across these iterations (21).
The performance of consensus clustering was assessed using the consensus matrix, cumulative distribution function (CDF) curve, relative alterations in the area under the CDF curve (Delta Area Plot), and cluster-consensus plot, in order to help determine the optimal number of phenotypes and evaluate whether the derived phenotypes are reasonable.

Statistics
Descriptive statistical analysis was performed using SPSS Statistics (version 22; IBM Corp.).Missing data were addressed using multiple imputation by 5 iterations, assuming they were missing at random.Implementation of other work is based on Python (version 3.7) and R programming language (version 4.2.1).The ICC coefficient was calculated by the two-way mixed effect models and consistency method, by using R package 'psych' package (version 2.2.9).Unsupervised agglomerative hierarchical clustering and the formation of dendrogram were based on Python package 'scikit-learn' (version 0.22.1).Chord diagrams were created using R package 'circlize' (version 0.4.15).We used binary logistic regression to estimate odds ratios (ORs) and 95% CIs of having radiological hip involvement across the derived-phenotypes.For all analyses, two-sided P values <0.05 were considered significant.

Radiomic features and phenotypes derivation
1422 radiomic features were extracted based on T2WI MRI images.After removing redundant and instable features, 1321 robust radiomic features were identified and used for model construction.The agglomerative hierarchical clustering model identified four phenotypes of patients (Figure 3).Characteristics including demographics, clinical variables, serum inflammation markers and previous treatments across the four phenotypes were presented in Table 1.
Phenotype I consisted of 38 ( 22 Frontiers in Immunology frontiersin.orglabelled as high-risk while phenotype III and IV were at low-risk for hip involvement in AS.

Validation of radiomic-derived phenotypes by consensus clustering
To assess the robustness of the derived 4-phenotype structure of radiomics data, we performed consensus clustering to validate the radiomics-based phenotypes.Based on the consensus matrix (Figure 4A), CDF curve (Figure 4B), Delta area plot (Figure 4C), k = 4 was identified as the optimal value for phenotyping the AS patients.Additionally, as expected, these four phenotypes had high cluster-consensus values (Figure 4D), indicating strong stability among the radiomic-derived phenotypes.
Therefore, according to clinical behaviors, MRI characteristics and radiographic outcomes, patients in phenotype I and II could be labeled as "advanced-stage hip involvement".Patients in phenotype I concomitantly exhibited significant acute inflammation signs and demanded anti-inflammatory therapy, especially TNFi treatment.Phenotype III and IV were assumed as "early-stage hip involvement", and phenotype IV was enthesitis-predominant, whereas patients in phenotype III were not yet identified based on the current variables.

Discussion
Hip involvement is prevalent in AS and constitutes an important reason of disability in AS (2,3).There remains unmet need that a method can make early and accurate identification of hip involvement in AS, as early detection means the opportunity to get timely treatments.Radiomics has gained increasing attention in the last few years, as a promising quantitative image analyzing method used for differential diagnosis, prognosis analysis and identification of responders to therapy (22,23).In this pilot study, four distinct phenotypes of AS-related hip involvement were identified by the integration of MRI radiomics data and unsupervised ML approach.This study is, to the best of our knowledge, the first to apply radiomics-based approach to profile hip involvement in AS.Our study validated the clinical actionability of using radiomics approach to detect hip involvement in AS, which offers opportunities for the foundation of a novel method, the MRI radiomics, to diagnose hip involvement in AS.
A 4-phenotype structure of radiomics data were derived and it was validated from the perspectives of clinical backgrounds, MRI signs and radiographic outcomes.Firstly, phenotype I and II were labelled as high-risk clinical pattern, in that they included more patients exposed to risk factors associated with hip involvement than the other two phenotypes (low-risk clinical pattern).Then, we used conventional MRI findings to validate the phenotyping structure and interpreted the radiomics-based phenotypes, since the 'black-box' nature of artificial intelligence-based approaches often provides results that are difficult to understand (24).Practitioners are more familiar with the clinical implications of MRI findings rather than radiomic features.Importantly, the significantly increased prevalence of MRI-detected structural damage on high-risk than low-risk phenotypes vigorously supported such clinical patterns.Additionally, patients in phenotype I had notable acute inflammation signs besides the presence of structural damage while phenotype IV was assumed as "enthesitis-predominant", given the prominent enthesitis findings on MRI.The profiling of phenotype III was challenging since it had limited cases number (only 24 patients).Patients in phenotype III were young and less likely exposed to risk factors associated with hip involvement, we carefully inferred that their nonspecific MRI findings may derive from other origins of hip joint pain, such as stress injury, acute bone marrow edema syndrome or femoroacetabular impingement (25,26), besides the possibility that they represent a stage, probably the early stage, in the progression of AS-related hip involvement.
The radiographic outcomes of hip involvement strongly supported the current phenotyping results.After adjusting for confounding factors, patients with high-risk phenotypes were associated with 3.0-fold higher odds of having radiological hip involvement than the low-risk (ORa 2.95 (95% CI 1.10, 7.92)).This finding suggested that radiomics-derived phenotyping could predict the radiographic outcome of hip involvement in AS, which makes the radiomics method a promising tool in the early identification of hip involvement in AS.Additionally, consensus clustering analysis significantly enhances the credibility and robustness of our findings.These results endorse that the derived phenotypes are not only statistically sound but also clinically interpretable and meaningful.
Among the reported MRI findings associated with hip involvement in AS, we don't know which were of predictive power for worse outcome or which could discriminate it from other reasons of hip pain.Our study provided some indirective evidence for this question.Joint effusion is an indirective MRI finding of hip synovitis and BME is linked to bone marrow capillary wall damage and leakage (5).Joint effusion and BME were quite common MR findings in AS patients with hip joint pain (7) but they had a low-level variance among the 4 phenotypes.Erosion, sclerosis and joint space narrowing were structural lesion findings in MRI, their roles were quite limited since the target was early diagnosis of hip involvement.Focal fat infiltration likely reflects postinflammatory tissue metaplasia: since the inflammation recedes, fat metaplasia develops in its place (27,28).The prevalence of fatty lesion was comparable in phenotype I, II and IV (36.8%, 38.2% and 38.0%, respectively), despite it subtle decreased in phenotype III (20.8%).We also found that enthesitis was a prevalent MRI finding in each phenotype and it comprised one distinct phenotype of patients.Further studies are needed to dissect the pathophysiologic significance of fat lesion and enthesitis in hip joints and their value in sorting out AS-related hip involvement from other origins of hip joint pain.It is noteworthy that we evaluated the described MRI signs in a crude mode that whether they existed or not and the emergence of sophisticated methods such as morphological feature analysis, quantitative scoring and radiomic feature analysis, had shed light on exploring of AS-specific MRI findings (10,29,30).
Our study has several limitations that should be acknowledged.Firstly, there existed sampling bias due to various factors, including relatively young population and a geographical area where AS population had limited biologics use (31), which may render a relative high prevalence of hip involvement.Additionally, we enrolled patients with AS (radiographic axial SpA) rather than non-radiographic axial SpA, which was assumed as the pre-stage of axial SpA (1).Further researches are needed to investigate whether our observations persist across racial, ethnic and the whole SpA groups.Secondly, we did not set out a specific prediction model or scoring system for the prediction of hip involvement in AS, which we believe requires further developed tools as well as external validation.Rather, we aimed to ascertain the potential of MRI radiomics approach to profile hip involvement in AS.We believed that the novelty predominantly lies in the described methodology, and perhaps less so in the detected four phenotypes, despite that they were comprehensively validated.Finally, patients in phenotype III were not yet identified and the underlying cellular or molecular level heterogeneity across the four phenotypes were not studied.
In conclusion, our results serve as a proof-of-concept that unsupervised ML methods could turn complex radiomics data into interpretable and clinically meaningful classification of hip involvement in AS.Our findings illuminate a promising approach to identify hip involvement in AS and its added value in clinical decision making should be evaluated in prospective studies.

FIGURE 1
FIGURE 1Workflow for the development and validation of the radiomics-based machine learning model.ROI: region of interest.

2
FIGURE 2 Example of hip MRI slices showed the range of handcrafted segmentation.(A) Regions of interest (ROI) of bilateral hips were labeled with green color in coronal plane.(B) The first slide containing ROI in axial plane.(C) The reconstructed 3D volume of ROI.(D) The last slide containing ROI in axial plane.

4
FIGURE 4 Validation of radiomic-derived phenotypes by consensus clustering.(A): Consensus matrix when k = 4. (B) Consensus CDF curves when k=2 to 6. (C) Relative alterations in CDF Delta area plot.(D) Cluster-consensus value of each phenotype when k=2 to 6.

TABLE 1
Characteristics and MRI findings of patients among different phenogroups.