Progression prediction of coronary artery lesions by echocardiography-based ultrasomics analysis in Kawasaki disease

Background Echocardiography-based ultrasomics analysis aids Kawasaki disease (KD) diagnosis but its role in predicting coronary artery lesions (CALs) progression remains unknown. We aimed to develop and validate a predictive model combining echocardiogram-based ultrasomics with clinical parameters for CALs progression in KD. Methods Total 371 KD patients with CALs at baseline were enrolled from a retrospective cohort (cohort 1, n = 316) and a prospective cohort (cohort 2, n = 55). CALs progression was defined by increased Z scores in any coronary artery branch at the 1-month follow-up. Patients in cohort 1 were split randomly into training and validation set 1 at the ratio of 6:4, while cohort 2 comprised validation set 2. Clinical parameters and ultrasomics features at baseline were analyzed and selected for models construction. Model performance was evaluated by area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC) and decision curve analysis (DCA) in the training and two validation sets. Results At the 1-month follow-ups, 65 patients presented with CALs progression. Three clinical parameters and six ultrasomics features were selected to construct the model. The clinical-ultrasomics model exhibited a good predictive capability in the training, validation set 1 and set 2, achieving AUROCs of 0.83 (95% CI, 0.75–0.90), 0.84 (95% CI, 0.74–0.94), and 0.73 (95% CI, 0.40–0.86), respectively. Moreover, the AUPRC values and DCA of three model demonstrated that the clinical-ultrasomics model consistently outperformed both the clinical model and the ultrasomics model across all three sets, including the training set and the two validation sets. Conclusions Our study demonstrated the effective predictive capacity of a prediction model combining echocardiogram-based ultrasomics features and clinical parameters in predicting CALs progression in KD. Supplementary Information The online version contains supplementary material available at 10.1186/s13052-024-01739-1.


Introduction
Kawasaki disease (KD) is an acute vasculitis of childhood that mainly affects children under 5 years old, and it is the primary cause of acquired heart disease in children of developed countries [1].Globally, the incidence of KD has substantially increased over the past decades [2], which was approximately 107.3 per 100,000 children aged < 5 years in China [3].Coronary artery lesions (CALs) are the predominant adverse complications of KD, which consist of coronary artery dilation (CAD) and coronary artery aneurysms (CAA).Despite the widespread adoption of standard treatment, approximately 9.0-15.9% of patients with KD may develop CALs [3][4][5].As research on KD progresses, the evolving research focus has shifted from merely assessing CALs' occurrence to monitoring their changes in the course of disease [6][7][8].
Several studies reported that 77.4-82.0%CALs normalize in dimensions within 2 years after KD onset [9,10], roughly 24% of patients with initially diagnosed CALs may persist or progress in subsequent evaluations [11].Notably, progressive CALs correlate with adverse late coronary artery outcomes [12], warranting heightened attention for KD patients experiencing CALs progression.Since primary adjunctive treatment can ameliorate the coronary artery outcomes of KD patients with CALs [13,14], early identification of patients at risk of CALs progression remains vital for improving their prognosis.
Utilizing intricate computer algorithms to extract extensive data from images, ultrasomics can analyze numerous quantitative image features that are challenging to discern with the naked eye [15,16].Increasing studies have indicated the potential of radiomics in diagnosing and predicting cardiovascular diseases in multiple imaging methods, such as coronary computed tomography angiography (CCTA) [17,18],cardiac magnetic resonance (CMR) [19,20], and echocardiographic examinations [21,22].As stated by the American Heart Association (AHA), transthoracic echocardiography (TTE) has been recommended as the primary method for coronary artery assessment in KD patients [1].Recently, two researches have applied deep learning (DL) algorithms to detect CALs on echocardiographic images, aiding in KD diagnosis [23,24].However, studies employing echocardiographic images via ultrasomics to identify KD patients at risk of CALs progression have not been reported.Numerous previous studies have explored risk factors for the persistence or progression of CALs based on medical records [6,9,25,26], identifying their associations with coronary artery imaging findings such as CAA size at diagnosis and the number of involved coronary arteries [9,26].However, there remains a relatively unexplored domain in comprehensively investigating coronary artery imaging features through ultrasomics.
Therefore, we conducted this study to develop and validate a predictive model that combined clinical parameters and echocardiographic ultrasomics features, expecting to identify KD patients at-risk in time and improve the coronary artery outcomes of these children.

Study patients and design
This observational study was conducted at the Department of Cardiology, Children's Hospital Capital Institute of Pediatrics in Beijing, China.Ninety-six patients were excluded due to various reasons such as incomplete medical records and poor quality of echocardiograpghic images, leaving 371 patients for subsequent analysis.The present study consisted of a retrospective cohort (cohort 1, n = 316) and a prospective cohort (cohort 2, n = 55).Cohort 1 included KD patients who hospitalized in our center between April 2018 and May 2022 to train and validate of predictive models.Cohort 2 prospectively recruited eligible patients from June 2022 to June 2023 to assess the generalizability of the model.
The inclusion criteria were as follows: (i) confirmed diagnosis of KD; (ii) hospitalized children; (iii) detected with CALs on any of the coronary arteries at the time of KD diagnosis or before intravenous immunoglobulin (IVIG) treatment.The exclusion criteria were as follows: (i) recurrent KD; (ii) no IVIG treatment during hospitalization; (iii) no available follow-up echocardiographic evaluation at 1-month after KD onset; (iv) coexisting congenital heart diseases; (v) subsequent diagnosis of other diseases, such as Takayasu arteritis.(vi) the delineation of regions-of-interest (ROIs) was restricted by the poor quality of the images.
The study was reviewed and approved by the Institutional Research Board of Children's Hospital Capital Institute (SHERLL2023048).Informed consent was obtained from at least one parent or guardian for each patient.
at the time of diagnosis or before IVIG treatment) were collected.Besides the diagnosis and treatment of KD, the definition of IVIG resistance, definition of complete and incomplete KD, and the frequency of echocardiographic evaluation were all based on the criteria of AHA (2017) [1].
The coronary artery findings were obtained by experienced ultra-sonographers with the Philips ie33 or 7c system.The internal dimensions of left main coronary artery (LMCA), left anterior descending artery (LAD), left circumflex artery (LCX), and 3 segments (proximal, middle, and distal) of right coronary artery (RCA) were measured and recorded by echocardiography and converted to Z scores according to the criteria of Kobayashi Z-score adjusted for body surface area [27].The maximum Z score (Zmax) was defined as the largest Z score of the four coronary artery branches (LMCA, LAD, LCX, or RCA) on echocardiography.
Given the significant association between the severity of CALs one month after disease onset and late coronary artery outcomes in KD patients [28,29], the present study compared the Z scores of coronary arteries at the 1-month follow-up with their baseline scores.Patients were categorized into 2 groups based on changes in Z scores of coronary arteries between baseline and 1-month follow-up: (1) CALs-progressed: any of four coronary artery branches presented an increased Z score at the 1-month follow-up; (2) CALs-improved: no coronary arteries presented increased Z scores, and at least one coronary showed a reduced Z score at the 1-month follow-up.

Image screening, ROIs segmentation and feature extraction
Considering the lowest frequency of occurrence and poor clarity of images regarded the sites of CALs located in the left circumflex and distal RCA, the images of LMCA, LAD, the proximal segment of RCA (RCAp), the middle segment of RCA (RCAm) were selected to subsequent analysis.All echocardiographic images in the DICOM format were anonymized to protect the privacy of the included patients.
Then the ROIs were manually segmented by an ultrasonographer (Shuai, Yang) and confirmed by another experienced ultrasonographer who had over 15 years' experience (Ai-Mei, CAO) in pediatric cardiology, using ITK-SNAP (version 3.8, www.itksnap.org)software.The ROIs included the vascular walls and lumen diameter of the typical sites of CALs on the images.Both ultrasonologists were blinded to the diagnosis results during the process of cardiac evaluation and ROIs segmentation.Ultrasomics features were extracted in Python (version 3.8.8)using Pyradiomics (version 2.2.0), which complies with the Imaging Biomarker Standardization Initiative (IBSI) guidelines.Intraclass correlation coefficient (ICC) was calculated to assess the reproducibility of feature extraction, of which an ICC value lower than 0.75 was removed.

Selection of ultrasomics features
Feature selection was conducted based on the training cohort.All extracted ultrasomics features from each patient were normalized using the Z-score method.Hierarchical analysis was performed based on Pearson's correlation analysis, and the redundancy features with correlation coefficients > 0.90 were eliminated.Subsequently, an analysis of variance (ANOVA) F-test statistic was used to select the top 30% features ranked by F-value (each feature has a individual F-value related to target events).
Nine machine learning algorithms were trained, including random forest (RF), support vector machine (SVM), decision tree, K-nearest neighbors (KNN), gradient boosting machine (GBM), light gradient boosting machine (LightGBM), extreme gradient boosting machine (XGBoost), multi-Layer perceptron (mLP), bernoulli naive Bayes (bNB).The performance of each algorithm was evaluated from two aspects, including accuracy (ACC) and the area under the receiver operating characteristic curve (AUROC).The optimal algorithm was selected after comprehensive evaluation of above two aspects.
To narrow the range of contributing factors, the importance of each factor was calculated using the SHAP (SHapley Additive exPlanation) tool.After ordering the importance of all variables from the highest to the lowest, the prediction performance of an increasing number of top factors was appraised by ACC, precision, and AUROC, upon which the minimal number of important variables was determined.

Selection of clinical parameters
The demographic and clinical characteristics, the responsiveness to IVIG treatment, and laboratory indicators prior to the treatment of IVIG of enrolled patients in the training set were analyzed to select clinical parameters for CALs progression.The bidirectional stepwise approach was adopted, and those significant variables (P < 0.1) were selected to subsequent model construction.

Statistical analysis
The clinical model was constructed by the clinical parameters selected by bidirectional stepwise approach using logistic regression analysis.The ultrasomics model was established with the ultrasomic features selected in the way as mentioned above.Finally, a clinical-ultrasomics predictive model was developed by integrating the selected clinical parameters and the ultrasomic features using logistic regression.Moreover, the performance of the three models was assessed by AUROC, area under the precision-recall curve (AUPRC) and decision curve analysis (DCA) in the training and two validation sets.
Results were expressed as numbers and percentages for categorical variables, and median (with inter-quartile range) or mean (with 95% confidence interval) for quantitative ones.Comparisons were performed using the Fisher's exact test for categorical variables and the Mann-Whitney U test for quantitative variables.A two-sided p-value < 0.05 was considered statistically significant.All data were collected anonymously.
The statistical handling was done by the community PyCharm (edition 2.2.0) on the Windows 10 system with the Python software (version 3.8.8).Missing data were supplemented according to the multiple imputation procedure, which was implemented by the MICE package in the R programming environment (Version 4.0).

Demographic and clinical characteristics of enrolled patients
As the flowchart of patient selection process shown in Fig. 1, a total of 371 patients were eventually enrolled based on the inclusion and exclusion criteria.Patients in the retrospective cohort were randomly divided into a training set (n = 189) and a validation set (validation set 1, n = 127) at a ratio of 6:4, whereas patients in the prospective cohort were all included into the prospective validation set (validation set 2, n = 55).The baseline characteristics of all participating students are presented in Table 1.At the 1-month follow-ups, 65 (17.5%)patients were categorized into CALs-progressed group, while 306 (82.5%) patients were categorized into CALs-improved group.Except for the significantly higher concentrations of serum fibrinogen (FIB) and interleukin-6 (IL-6) in validation set 2 compared to both the training set and validation set 1 (both P < 0.05), no other significant differences were observed among the three sets, indicating that the basically balanced data distribution among three datasets.

Extraction and selection of ultrasomics features
Analysis was conducted on 1484 echocardiographic images, from which 5636 ultrasomics features were extracted within the ROIs.These features encompassed 504 first-order features, 56 shape features, 2100 textural features, and 2976 wavelet-based features.
The performance of ultrasomics features analyzed by nine machine learning algorithms for CALs progression in children with KD are provided in Table 2, including accuracy and AUC.The hyperparameters of the nine machine learning algorithms used in this study are detailed in Table S1.Since the SVM algorithm performed the best on the training set, it was chosen for feature selection.By using the optimal SVM algorithm, the cumulative performance of top 10 factors according to the descending importance was calculated and the top eight important variables had satisfactory prediction power (Table 3).On the final selection of features from the training set, 8 ultrasomics features were final selected.After eliminating the redundancy with correlation coefficients > 0.90, 6 ultrasomics features were finally used for further analysis, including a neighbouring gray tone difference matrix (NGTDM) feature, four gray-level size zone matrix (GLSZM) features, and a gray level dependence matrix (GLDM) feature.To evaluate the contribution of the 6 ultrasomics features selected, the importance of each factor was gauged and ranked, as is illustrated in Fig. 2. Details on the ultrasomics features, which can be available in the description of the Pyradiomics package (https://pyradiomics.readthedocs.io/en/latest/features.html#),are listed in supplementary materials (Table S2).

Selection of clinical parameters
As shown in the Table 4, the bidirectional stepwise approach selected three variables to construct the clinical model, which were number of coronary arteries involved (OR: 2.40, P < 0.01), albumin (ALB) (OR: 0.90, P = 0.08), and FIB (OR: 1.30, P < 0.01).

Predictive models construction and evaluation
The

Discussion
In the present study, we developed three models to predict the CALs progression from KD onset to the 1-month follow-up, including a model based on clinical parameters, a model based on ultrasomics features, and an integrated model that combined clinical parameters and ultrasomics features.Furthermore, validation across two sets revealed that the clinical-ultrasomics model consistently outperformed both the individual clinical and ultrasomics model.
In our previous study, we developed a predictive model based on clinical parameters to forecast CAL progression at 1 month after KD onset, achieving an AUC value of 0.80 30 .As we know, echocardiography is currently the primary method for assessing CAL in KD, and ultrasomics technology can detect subtle changes in ultrasound images that are not visible to the naked eye.In order to explore whether adding ultrasound imaging information can improve the predictive power of the model, the present study was designed and conducted.The present study showed that the clinical-ultrasomics model achieved an AUC value of 0.84, surpassing the previous model [30].Additionally, this study broadened the model's applicability and proved its robustness by validating its performance with a prospective cohort, whereas the applicability of the previous model remained unknown as it was only validated internally.To the best of our knowledge, this is the first study to investigate the value of The clinical model was constructed based on three variables, that is the number of coronary arteries involved, ALB, and FIB.Extensive research has consistently demonstrated an association between a greater number of involved coronary arteries and lower serum ALB levels with CALs progression in KD patients [7,9,31,32], aligning with our own findings.FIB, an acutephase protein synthesized in the liver under inflammatory or traumatic conditions, is indicative of both hypercoagulation and severe inflammation in patients.Chen et al. [33] observed significantly elevated FIB levels in KD compared to healthy controls.Additionally, Liu et al. [34] demonstrated that plasma FIB concentration in KD patients with CALs was notably higher than in those without CALs.Our study further extended these findings, establishing an association between FIB levels and the progression of CALs in KD.
In this study, SVM was the best machine learning algorithm when predicting the CALs progression of KD patients.SVM constructs a hyperplane concept to classify observations and is the closest machine learning algorithm close to DL, which might account for its optimal performance in this study.The present study identified three texture-based features and three higher-order features obtained by wavelet transformation of the original images.These features partially reflect the texture changes in echocardiograms, suggesting that KD patients at high risk of CAL progression may exhibit texture changes invisible to the naked eye at disease onset.Moreover, the most significant ultrasomics features in the model were all related to the LAD and the proximal or middle segment of RCA.This may be due to the suboptimal visualization of distal coronary segments by echocardiography compared to other segments [1].Given that our study was designed based on echocardiographic findings, it's comprehensible that the most significant ultrasomics features were associated with the LAD and RCA.Although most of the ultrasomics features identified in our study were associated with the middle segmentation of RCA, the most significant feature was the LAD related features.Hence, our findings highlight the importance of examining multiple sites of coronary arteries to gather comprehensive information.
The diagnosis of KD primarily relies on clinical findings, laboratory indicators, and echocardiographic observations, all of which lack specificity.While echocardiographic imaging aids in KD diagnosis, it alone is insufficient for KD diagnosis due to the presence of CAD in other febrile diseases [35,36].In the present study, the ultrasomics model didn't surpass the clinical model in predicting CALs progression.This could be attributed to the clinical model encompassed both laboratory indicators (ALB and FIB) and echocardiographic finding (the number of involved coronary arteries), whose information spectrum was broader than the ultrasomics model.Nevertheless, our findings demonstrate that ultrasomics features extracted from echocardiographic images can enhance the prediction of CALs progression when combined with clinical parameters, as evidenced by the superior performance of the clinical-ultrasomics model across three distinct datasets.This trend holds promise as it offers deeper insights into managing KD patients and provides valuable guidance for future research directions.As multiple studies have demonstrated the superiority of CCTA in detecting CALs located in the distal segments and the left circumflex branches of coronary arteries compared with TTE [37,38], exploring CTCA-based radiomics to identify at-risk KD patients with CALs warrants further investigation.
Several limitations should be acknowledged.Firstly, despite we validated the prediction models with a prospective cohort in our center, the absence of external validation may restrict the generalizability of our findings.Secondly, our reliance on static baseline echocardiographic images provides only a snapshot of the condition at the time of image capture.The utilization of dynamic video records might potentially offer more comprehensive information, thereby enhancing predictive capabilities.Thirdly, as the clinical-ultrasomics model was applied to KD patients with CALs detected by baseline echocardiography to predict the outcome of their CALs at 1 month in the course of KD, the access to high-quality images and the manual segmentation of typical areas of CALs by echocardiography specialist are prerequisites for its application.Fourth, the ROIs of CALs were manually segmented in the present study, which could be subjective and introduce observers' bias.Semi-automatic or automatic segmentation methods are needed in the future.Last but not least, the imbalance in the number of patients between the two groups within our datasets contributed to the relatively low specificity observed in all three prediction models.Addressing this imbalance by increasing the number of CALs-progressed KD patients to align more closely with the control group's size may mitigate this issue.

Conclusions
Our study demonstrated that prediction model involving echocardiogram-based ultrasomics features and clinical parameters has a good predictive efficacy in forecasting CALs progression in 1-month follow-up among KD patients with CALs.This highlights the promising potential of ultrasomics in predicting CALs progression using baseline echocardiographic images, providing clinicians a valuable tool to detect CALs at their early stages and consequently aiding in the tailoring of individualized treatment for KD.

Fig. 1
Fig.1The flowchart of the study.Abbreviations KD, Kawasaki disease; CALs, coronary artery lesions; IVIG, intravenous immunoglobulin construction and evaluation of clinical model, ultrasomics model and clinical-ultrasomics model in the training set and two validation sets are presented in Figs. 3, 4 and 5.The AUROC of ultrasomics model in the training set (0.69) was found to underperform the clinical model (0.78), while the clinical-ultrasomics model (0.83) exhibited better predictive performance (Fig. 3A).Similar results were observed in the internal validation set and the prospective validation set (Fig. 3B and C), confirming the additional prognostic performance of ultrasomics features.Moreover, the AUPRC and DCA also validated that the clinical-ultrasound omics model outperformed the single clinical model and the ultrasound omics model

Fig. 2
Fig.2Top 6 ultrasomics features for predicting for CALs progression of KD patients in a descending order of importance.Abbreviations: KD, Kawasaki disease; CALs, coronary artery lesions; LAD, left anterior descending artery; RCAm, the middle segment of right coronary artery; NGTDM, neighbouring gray tone difference matrix; GLSZM, gray-level size zone matrix; GLDM, gray level dependence matrix; GLRLM, gray-level run-length matrix.

Fig. 4 Fig. 3
Fig.4 The PRCs for ultrasomics model, clinical model, and the clinical-ultrasomics model in three sets.PRCs for three models in the training set.PRCs for three models in validation set 1. PRCs for three models in validation set 2. Abbreviations: PRC, Precision-recall curve; AUPRC, area under the precision-recall curve

Fig. 5
Fig. 5 The DCA for ultrasomics model, clinical model, and the clinical-ultrasomics model in three sets.DCA for three models in the training set.DCA for three models in validation set 1. DCA for three models in validation set 2. Abbreviations: DCA: Decision curve analysis

Table 2
Prediction of ultrasomics features analyzed by 9 machine learning algorithms for CALs progression in children with KD

Table 3
Distributions of AUC, accuracy and precision with the cumulating number of top 10 important factors in an ascending order Abbreviations: AUROC, area under the receiver operating characteristic curve; ACC, accuracy

Table 4
Multivariable logistic regression for clinical variables