Hematopoietic stem-cell senescence and myocardial repair - Coronary artery disease genotype/phenotype analysis of post-MI myocardial regeneration response induced by CABG/CD133+ bone marrow hematopoietic stem cell treatment in RCT PERFECT Phase 3

Background Bone marrow stem cell clonal dysfunction by somatic mutation is suspected to affect post-infarction myocardial regeneration after coronary bypass surgery (CABG). Methods Transcriptome and variant expression analysis was studied in the phase 3 PERFECT trial post myocardial infarction CABG and CD133+ bone marrow derived hematopoetic stem cells showing difference in left ventricular ejection fraction (∆LVEF) myocardial regeneration Responders (n=14; ∆LVEF +16% day 180/0) and Non-responders (n=9; ∆LVEF -1.1% day 180/0). Subsequently, the findings have been validated in an independent patient cohort (n=14) as well as in two preclinical mouse models investigating SH2B3/LNK antisense or knockout deficient conditions. Findings 1. Clinical: R differed from NR in a total of 161 genes in differential expression (n=23, q<0•05) and 872 genes in coexpression analysis (n=23, q<0•05). Machine Learning clustering analysis revealed distinct RvsNR preoperative gene-expression signatures in peripheral blood acorrelated to SH2B3 (p<0.05). Mutation analysis revealed increased specific variants in RvsNR. (R: 48 genes; NR: 224 genes). 2. Preclinical:SH2B3/LNK-silenced hematopoietic stem cell (HSC) clones displayed significant overgrowth of myeloid and immune cells in bone marrow, peripheral blood, and tissue at day 160 after competitive bone-marrow transplantation into mice. SH2B3/LNK−/− mice demonstrated enhanced cardiac repair through augmenting the kinetics of bone marrow-derived endothelial progenitor cells, increased capillary density in ischemic myocardium, and reduced left ventricular fibrosis with preserved cardiac function. 3. Validation: Evaluation analysis in 14 additional patients revealed 85% RvsNR (12/14 patients) prediction accuracy for the identified biomarker signature. Interpretation Myocardial repair is affected by HSC gene response and somatic mutation. Machine Learning can be utilized to identify and predict pathological HSC response. Funding German Ministry of Research and Education (BMBF): Reference and Translation Center for Cardiac Stem Cell Therapy - FKZ0312138A and FKZ031L0106C, German Ministry of Research and Education (BMBF): Collaborative research center - DFG:SFB738 and Center of Excellence - DFG:EC-REBIRTH), European Social Fonds: ESF/IV-WM-B34-0011/08, ESF/IV-WM-B34-0030/10, and Miltenyi Biotec GmbH, Bergisch-Gladbach, Germany. Japanese Ministry of Health : Health and Labour Sciences Research Grant (H14-trans-001, H17-trans-002) Trial registration ClinicalTrials.gov NCT00950274


Introduction
The hematopoietic system has traditionally been considered as an organized, hierarchical system with multipotent, self-renewing stem cells at the top, lineage-committed progenitor cells in the middle, and lineage-restricted precursor cells, which give rise to terminally differentiated cells, at the bottom [[1À3]]. However, disorders of clonal hematopoiesis of indeterminate pathology (CHIP) has been described in hematological and cardiovascular disease patients and associated to congenital or somatic DNA mutations [4,5]. The question arises, which mutations in stem congenital or somatic cell regulatory genes cause hematopoietic clonal advantage and impact cardiovascular pathology [6,7]. SH2B3, which codes for the LNK adaptor protein, is one of the major mutated genes associated with hematopoietic stem cell (HSC) proliferation disorders, such as myelodysplasia, erythrocytosis or leukemia [8,9]. In genome wide association studies (GWAS) of cardiovascular patients, the SH2B3 phosphorylation related missense variant rs3184504 was found to be associated with increased platelet count, monocyte proliferation, hypertension, peripheral/coronary artery disease, autoimmune disease, and longevity [9][10][11][12][13][14][15]. SH2B3/LNK expression regulation is largely unknown, but expected to impact cardiovascular regeneration through c-KIT/CD117 expressing hematopoietic, myeloid, lymphocytic, endothelial, and mesenchymal progenitor cells in blood [9,10]. In contrast to this, intracardiac SH2B3/LNK expression was found to be associated with pressure overload cardiac hypertrophy regulation [16]. At present, the regulatory role of SH2B3 in stem cell proliferation and inflammation response remains unclear in patients with coronary artery disease, especially in post-myocardial infarction repair leading either to regeneration or inflammatory fibrosis of the myocardium [9,13]. Furthermore, it is unclear, if a monogenic switch of SH2B3 gene expression or SNP altered LNK protein function in bone marrow stem cells is able to control cardiac regeneration by altering bone marrow response [9]. Moreover, frequency and type of SH2B3 clonal mutations of HSC of patients with cardiac disease is unknown and may have impact on variable pathology. In the recent outcome analysis of the phase 3 clinical PERFECT trial we are investigating intramyocardial transplantation of c-KIT/CD117 + /CD133 +, /CD34 + bone marrow derived hematopoeitic stem cells (BM-HSC) in postmyocardial infarction (MI) coronary artery bypass graft (CABG) patients. We found striking differences in induction of cardiac regeneration in 60% of BM-HSC treated and placebo groups characterized by a preoperative Machine Learning (ML) signature in peripheral blood (PB) [17]. Responders (R) vs. non-responders (NR) were significantly different preoperatively, with R characterized by increased peripheral blood c-KIT/CD117 + /CD133 + /CD34 + circulating stem cells (EPC), increased thrombocytes, while NR had increased Erythropoeitin (EPO), Vascular endothelial growth factor (VEGF) and N-terminal pro b-type natriuretic peptide (NTproBNP) in preoperative serum [17]. Induced bone marrow stem cell proliferation responses in R was suspected to be due to adaptor protein SH2B3/LNK activity [17]. Based on this, we first performed variant and gene expression analyses in PERFECT responders vs. non-responders and compared diagnostic RvsNR signatures (Fig. 1A). Then we validated the effect on R/ NR signature switch in SH2B3/LNK deficient mouse models to investigate the role of HSC dysfunction in cardiac repair . Final evaluation of the signatures was performed in an independent patient cohort and by mouse/man correlation analysis (Fig. 1B,C).

Ethical approval and trial setting
RNA sequencing (RNA-Seq) analysis and mRNA RT-PCR in PB: Samples were taken from informed study patients who gave their written consent according to the Declaration of Helsinki (Approval by the Ethical committee, Rostock University Medical Center 2009; No. HV-2009-0012). Analyses and examinations were performed before unblinding of the trial and under careful adherence to the protection of data privacy (pseudonyms).

Transcriptome Analysis of EDTA blood samples using NGS
RNA of frozen EDTA blood samples was isolated in a three step procedure: First, the GeneJET Stabilized and Fresh Whole Blood RNA KIT (Thermo Scientific) was used following manufacturer's instructions. Second, isolated RNA was precipitated with 2.5 volumes ethanol under high salt conditions (10 % of 3 M sodium acetate, pH 52). After DNase digest (Thermo Scientific), the RNA was finally purified using Agencourt RNAClean XP beads (Beckman Coulter). Purified RNA was analyzed on a Bionalyzer (Agilent) using RNA 6000 Nano Chips (Agilent). Quality controlled RNA was used to construct sequencing libraries using the Universal Plus mRNAÀSeq Technology (Nugen) according to manufacturer's instructions. Briefly, mRNA was selected by oligo d(T) beads, reverse transcribed and cDNA from Globin messengers was removed by the Globin depletion module (Nugen). Quality controlled and quantified libraries were sequenced on a HiSeq1500 system (Illumina) in single-end mode (100 nt read length). For RNA-Seq data analysis, adapter clipping and quality trimming procedures for data pre-processing were performed and aligned the reads to the hg19 genome (for patient data) and mm10 genome (for murine data) with the aid of kallisto, respectively. Differential expression analysis was performed using the likelihood ratio test of the DESeq2 package (genes with >2-fold change and a q-value < 005 are considered as significantly differentially expressed). The gene set enrichment analysis (GSEA), annotation, including functional annotation clustering and functional classification, was performed according to Enrichr [18].

Variant calling from transcriptomic data
The previously preprocessed human RNA-Seq datasets were realigned to the hg19 reference (Ensembl version 94) with Star (2-pass mode). The variant calling was applied by the Gatk toolkit [19] with specialized filters (e.g. variants are only considered, if they are confirmed with five independent reads -a comprehensive workflow is shown in Supplement Figure S1.

Research in context
Evidence before this study The basis for this current work is the randomized doubleblinded placebo controlled multicenter Phase 3 PERFECT-trial in which post myocardial infarction (MI) patients after coronary artery bypass graft (CABG) surgery have been treated with intramyocardial CD133 + bone marrow derived hematopoetic stem cells (BM-HSC) or Placebo. At the time we identified the correlation of myocardial regeneration with systemic bone marrow response characterized by a preoperative biomarker signature in peripheral blood (PB) of 20 angiogenesis and stem cell related factors [17]. An additional outcome prediction obtained by Machine Learning (ML) received an accuracy rate of 85% for responder (R) and 80% for non-responder (NR). Here, genetic dysregulation of BM-HSC was suspected and now followed up by gene expression and mutational dysregulation analysis. LNK is an adaptor protein coded by the gene SH2B3 and negatively regulates multiple essential signals in hematopoietic stem cells (HSC). Its regulatory role for BM-HSC in cardiovascular repair remains shallow and will be investigated throughout this underlying manuscript.

Added value of this study
In the present series of experiments, we clarified that HSC signaling adaptor gene mutations in SH2B3 contribute to a polygenic gene expression circuit switch including the genes PLCG1, LPCAT2, GRB2, AFAP1, AP1B1, KLF8, MARK3 favorable for the cardiac healing process in MI-patients undergoing cardiac recovery after CABG surgery. An integrative ML analysis of preoperative PB enables highly sensitive clinical diagnosis and prediction of cardiac regeneration response after CABG. It may be used for treatment monitoring for cardiac regeneration and give rise to a patient specific ML supported therapy in the future. Our findings in PERFECT about RvsNR and in SH2B3/ LNK À/À mice suggest that the significantly reduced ischemic myocardial damage with preserved cardiac function following MI is mainly due to enhanced angiogenesis in ischemic myocardium.

Implications of all the available evidence
This novel approach of disease genotype/phenotype analysis combining gene expression, coexpression, and transcript variant calling in a randomized clinical trial led to the discovery of a polygenic circuit involved in HSC response associated to cardiac regeneration capacity. In the following, the findings were verified by animal studies and assisted by correlation analysis of human and mice. This comparison enabled new insights into adaptor proteins, proliferation signaling, and immune checkpoint regulation controlling for vasculogenesis/angiogenesis and cardiac tissue repair. Recovery of expedient cardiac function was observed through up-regulation of HSC/EPCs circulation and stimulation of immune progenitor cell (PC) proliferation. Our findings show that mutational changes in gene expression transcripts have important implications for formulations of new therapeutic strategies to diagnose and enhance cardiac repair by stem cells. The lentiviral vectors pRRL.U6.Lnk-sgRNA.EFS.dTomato.pre or pRRL.U6.NT-sgRNA.EFS.eBFP2.pre were packaged into viral particles by transfection of 10 mg vector, 12 mg pcDNA3.GP.4xCTE, 6 mg pRSV-Rev and 2 mg pMD2.G into HEK-293T cells in 10 cm plates using the calcium-phosphate method. Medium change was performed 6-8 h later and viral supernatants were harvested 30 h and 54 h post-transfection. The lentiviral supernatants were pooled and concentrated by ultracentrifugation. Vector titers were determined on lineage-negative mouse bone marrow cells.

Experimental SH2B3/LNK knockout model
The SH2B3/LNK À/À mouse strain was generated as described previously [10]. C57BL/6 mice (CLEA Japan, Tokyo, Japan) were used as WT control mice. GFP transgenic mice (GFP-Tg mice; C57BL/ 6TgN [act EGFP] Osb Y01) were mated with WT mice or SH2B3/LNK À/À mice and generated WT/GFP mice or SH2B3/LNK À/À /GFP mice, respectively, for BM transplantation (BMT) studies. All experimental procedures were conducted in accordance with the Japanese Physiological Society Guidelines for the Care and Use of Laboratory Animals and the study protocol was approved by the Ethics Committee in RIKEN Center for Developmental Biology.

Statistical analysis
The results were statistically analyzed using a software package (Statview 5.0, Abacus Concepts Inc, Berkeley, CA). All values were expressed as mean §standard deviation (mean §SD). The comparisons among more than three groups were made using the one-way analysis of variances (ANOVA) in Prism 4 (GraphPad Software, San Diego, CA). Post hoc analysis was performed by Tukey's multiple comparison test, Mann-Whitney comparison test or Bonferroni posthoc test. Differences of p<005 were considered to denote statistical significance.

Data analysis with machine learning
Identifying key features and classification of the comprehensive patient data was obtained by employing supervised and unsupervised Machine Learning (ML) algorithms . We preprocessed the data, while removing features with low variance and high correlation for dimension reduction, following best practices recommendations. We compared the following supervised algorithms: AdaBoost (AB), Gradient Boosting (GB), Support Vector Machines (SVM), and Random Forest (RF) [20]. We employed classifiers that are suitable for training on small data sets for a comparison of features given little training and chose the most appropriate algorithm according to accuracy and robustness towards overfitting [21]. Supervised ML models were 10fold cross-validated and 100 times repeated. We then applied feature selection for the AB, GB, and RF classifiers to further reduce the number of features to <20. We employed principial component analysis (PCA), t-distributed Stochastic Neighbor Embedding (tSNE), and Uniform Manifold Approximation and Projection (UMAP, https://arxiv. org/abs/1802.03426) for unsupervised machine learning classification and nonlinear dimensionality reduction.

WGCN analysis
Weighted gene coexpression network analysis (WGCNA) was performed by applying the R package "WGCNA" to the human RNA-Seq count data. We first constructed the topological overlap matrix (TOM) of all investigated transcripts (~160,000) using the soft thresholding method. We calculated the eigenvalues of the transcripts and evaluated the adjacency based on distance. We subjected transcripts to hierarchical clustering (average linkage) and assigned transcripts with the dynamic hybrid method into groups. We computed the connectivity based on the interaction partners (k) and evaluated the gene significance, which represents the resulting module membership.

Results
In our analysis we integrated clinical genotype and phenotype data as well as experimental gene knockout animal modeling in which we aimed to unravel and validate diagnostic associations of blood, bone marrow, and heart tissue ( Fig. 1). At the phenotypic level, left ventricular function measured in magnetic resonance imaging (MRI) showed recovery with a mean difference in primary endpoint outcome ΔLVEF (d.180/0) in Responders (R) +16% vs. Non-responders (NR) -11% (p<001, t-test; Mann-Whitney Rank Sum test) ( Table 1). Significant difference was found in R for myocardial capillary perfusion measured in MRI with increased epicardial (p=0038, t-test; Mann-Whitney Rank Sum test) and endocardial (p=0024, t-test; Mann-Whitney Rank Sum test) maximal upslope velocity after 180 days (Table 1).

Gene expression analysis
In addition to previously identified correlating angiogenesis biomarkers and SH2B3/LNK RT-PCR analysis of PB [17], we performed an in-depth gene expression analysis. In order to study transcriptome profile patterns of R and NR signatures, the capture of polyadenylated RNA was conducted by high throughput sequencing. The experimental procedure included a depletion of cDNA derived from Globin messengers transcriptome to enable high resolution RNA-Seq in preoperative PB samples from 23 patients (14 R,9 NR).
Differential gene expression analysis revealed distinct R/NR patterns consisting of 161 significant genes (q<005) out of~160,000 transcripts. The highest significance was found for 122 unique genes (R/NR: q=002) (Supplementary Data SD1a). Clustering for all used methods examined potentially occuring patient subgroups. Three independent clustering analyses (PCA, tSNE, and UMAP) on all gene expression read counts showed a clear distinction between patients ( Fig. 2a, Supplementary Figure 1c). All methods clustered the patients into the same defined subgroups, which did not change. Pathway analysis of differing genes was subsequently conducted on each of the three clusters to investigate the specific differences towards the gene signaling among these subgroups (Fig. 2a, Supplementary Data SD1b). Then, we performed the coexpression analysis by WGCNA, an so-called guilt-by-association approach, to be able to interconnect SH2B3/LNK with similarly regulated transcripts. SH2B3/LNK was identified to be coexpressed within a cluster of 872 genes (Supplementary Data SD1c). The corresponding pathways of the coexpressed genes were c-KIT receptor signaling pathway, as well as EGF, PDGF, TCR, IL6, and Interferon 1 signaling (Table 2).

Gene variant analysis
Transcriptomic mutation signature analysis performed on RNA-Seq data revealed increased specific variants in NR vs. R (NR: 465/ 178550 variants were contained in all NR, involved 224 genes with 268 exon regions; R: 113/212215 variants contained in all R, involved 48 genes with 56 exon regions (Supplemental data SD1d) (Fig. 4a,  4b). The DNA sequencing (DNA-Seq) confirmed more than 90% of the variants that were called in SH2B3 from RNA-Seq data (Fig. 4c, Supplemental Table 2). Total amount of variants, SNPs, and InDel mutations were not different in RvsNR (Fig. 4a, 4b). Main pathways possibly affected by variants were proteasome degradation (NR) and mRNA processing (R) ( Table 3). Frequent mutations were present in all top-listed R/NR correlating genes including SH2B3/LNK, NOTCH2, PDCD1/PD-1, VEGF-B, PLCG1, GRB2, PROM1/CD133, mTOR, but also in the CHIP-related genes DNMT3, TET2, ASXL1 that were identified by RNA-Seq SNP calling (Supplementary Data SD1d). Moreover, variants in the reference gene GAPDH used for RT-PCR were observed with differences in DDCT calculation as compared to Pol2a (Supplementary Data SD1e). Therefore, RT-PCR gene expression previously used for SH2B3 [17] was not used for the final outcome analysis. In addition, the Src-family adaptor protein coding gene SH2B3/LNK was found to be modified by SNP in DNA-Seq analysis of all patients (23/23) by deletions (100%) or nucleotide exchange (100%), with 83% of SNP resulting in amino acid substitution (Table 3). In RNA-Seq transcriptome analysis, the SNP variant rs3184504 p.Trp262Arg amino acid exchange in the pleckstrin binding domain was found in DNA-Seq (78%) and in RNA-Seq (83%) (Fig. 4c, supplementary Data SD1d, Table S2). Patients with rs3184504 SNPs in DNA-Seq (n=17/23) were distributed: R 12/14, NR 5/9. In order to validate the influence of single gene silencing on R/NR gene circuit and the resulting phenotype, we studied gene knockout effects of SH2B3 in mouse models.

Clonal advantage of SH2B3/LNK knockout HSC in an experimental competitive transplantation assay
To show that SH2B3/LNK-deficient hematopoietic stem and progenitor cells (HSPCs) have an advantage to repopulate the bone marrow after partial bone marrow ablation, we performed a competitive transplantation assay. CRISPR-Cas9-mediated SH2B3 À/knockout cells (labeled with a dTomato fluorescent reporter) competed against eBFP2-marked SH2B3-intact competitor cells transduced with a nontargeting sgRNA (Fig. 5A). At week 18 post-transplantation, the bone marrow of transplanted mice consisted of 957% § 24% donor cells, which indicated that despite partial bone marrow ablation, the remaining recipient cells were displaced by donor cells (Fig. 5B).
Blood counts of transplanted mice revealed a normal red blood cell count compared to untreated control mice (Fig. 5C). However, we observed a significant increase of white blood cells and platelets in SH2B3 À/À transplanted mice in comparison to control animals (Fig. 5D, E). Already four weeks after transplantation, the donor cells were dominant with 643% § 138% by dTomato + SH2B3 À/cells, which remained stable over time (Fig. 5F). Moreover, dTomato + SH2B3 À/À cells predominantly contributed to the hematopoiesis and outcompeted eBFP2 + competitor cells in myeloid, B-cell, and T-cell formation as quantified in the peripheral blood (Fig. 5G). Similar results were obtained for the different lineages (myeloid CD11b+ , B B220+ and T CD3+ cells) in the bone marrow and spleen (Fig. 5H, I). Even 682% § 255% of Lineage Sca1 + cKIT + (LSK) HSPCs in the bone marrow were dTomato + SH2B3 À/À cells, which revealed a repopulating advantage also on the stem cell level (Fig. 5J). The selective advantage of SH2B3 À/À was also present in the T cell compartment, including CD4 + and CD8 + single-positive, double-positive, and double-negative T cells in the thymus of transplanted mice (Fig. 5K). To compare gene expression profiles between WT and SH2B3 À/À cells, we performed an in-depth gene expression analysis. In order to study transcriptome profile patterns of SH2B3 À/and WT peripheral blood signatures, the capture of polyadenylated RNA by high throughput sequencing was applied. To ensure a high comparability to the human samples, the The plot shows the ratio of SNP/del sites that are identified in Responders (red) and Non-responders (grey) as well as the possible amino acid transfer from its origin to its potential replacement (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.). cDNA derived from Globin messengers was also depleted. Correlation analysis was performed (Supplementary Figure 3; Supplementary Data SD1f) and showed a correlation (p<001) of SH2B3 À/with LPCAT2, NOTCH2, PDCD1LG2/PD-1, PROM1/CD133, ATXN1, and MTOR. SH2B3 +/+ was negatively correlated to gene expression of LPCAT2, KITL, PDCD1/PD-1 and positively correlated to PLCG1 and AP1B1 (Fig. 5L). Affected pathways detected by GSEA include the Kit receptor, EGFR1, and IL-2 signaling pathways (p<005).

ML-integrated stratification of patients
Combining gene expression features and biomarkers led to an improved patient signature for outcome prediction from 81% [17] to finally 96%, which was achieved by only using the top eight ML-  5. Influence of SH2B3 on HSC clonal overgrowth by using competitive bone marrow transplantation of Sh2b3 À/À HSPCs. a: Scheme of the competitive transplantation assay is shown. HSPCs, which are derived from a SpCas9 transgenic mouse model (GFP + ), were transduced with a lentiviral vector carrying a sgRNA against Sh2b3 and a dTomato fluorescent reporter. As competitor cells, HSPCs were transduced with a non-targeting sgRNA and an eBFP2 fluorescent reporter. After transduction, the Sh2b3 À/À and Sh2b3-intact competitor cells were transplanted in a 1:1 mixture into irradiated C57BL/6 (B6, GFP À ) recipient mice. Irradiation was performed using a fractionated dose of 2 £ 4 . 5 Gy. b: Percentage of donor (GFP + ) and recipient (GFP À ) cells of total CD45 + cells in the bone marrow of mice at week 18 after transplantation. c: Red blood cell (RBC) count in Sh2b3 À/À transplanted mice and selected features (Fig. 7A). Pathway analysis of most important ML features focusses on genes regulating hematopoietic stem cell receptor and proliferation signaling ( Table 2). R/NR selective discriminative factors were identified by combining gene expression difference, coexpression analysis, SNP/variants, and Machine Learning ( Fig. 7B; Supplementary Fig. S6). Validation of an independent patient data set revealed 85% accuracy (12/14 patients) for stratification (Fig. 7C) and classification concordance to ML-clusters (Fig. 7D, E). All missclassified patients in the primary biomarker cohort analysis (1/23, male R) and the validation cohort (2/ 14; male/R and female/NR) exclusively showed a single SNP (rs.56313931 G/A) mutation in the X-chromosome linked target biomarker KLF-8 (3/37) (Supplementary Data SD1D).

Discussion
In the randomized PERFECT phase 3 CABG/CD133 + stem cell trial, we observed a preoperative signature of circulating BM-HSC/EPC and angiogenesis parameters in peripheral blood that significantly correlated to postoperative response of myocardial regeneration in contrast to no difference in surgical CABG-procedure with complete coronary revascularization [17]. Using RNA sequencing analysis in blood, we have found for the first time a corresponding signature for a decreased (NR) or enhanced (R) bone marrow stem cell response upon ischemic/inflammatory response to CABG. Gene expression patterns are characteristic for a RvsNR preoperative steady-state and include pathways for proliferation control (EGFR, PDFR, TCR, c-KIT), inflammation (IL2, IFN1), and platelet activating factor synthesis (LPCAT2). Identified hub genes related to BM-HSC response were mainly genes coding for signaling, adaptor, and transcription regulation proteins.
Our independent ML-based feature selection of RvsNR discriminating the most important factors gives evidence for a gene circuit effecting myocardial repair by modified expression of signaling and adaptor protein gene transcription. Given the complexity of stem cell response to ischemic or inflammatory stimuli as induced by CABGsurgery, we found access to circuit pattern recognition in patients only by unbiased ML methods. Learning from failed reductionistic attempts to define single control factors of stem cell reactivity, we propose this approach as an independent and unbiased start. The association of genes such as PLCG1, LPCAT2, AP1B1, AFAP1, GRB2, KLF8, MARK3, with circulating CD133 + /CD34 + cells, and serum proteins EPO/VEGF were initially identified using ML and sequentially validated experimentally in mice and in an additional patient cohort. Further clustering analysis of the overall patient cohort revealed subgroups representing the complexity of gene expression, coexpression, and mutational variants. Their functional interplay has to be studied in additional experimental models as exemplarily shown here for the Src adaptor protein SH2B3/LNK that regulates crosstalk between integrin and cytokine signaling pathways in BM-HSC/megakaryocyte proliferation.
Moreover, for the first time multiple somatic mutations involving stem cell functional genes as NOTCH2, PROM1/CD133, MTOR as well as SH2B3 gene were found in PERFECT patients with coronary arteriosclerotic disease. In this context, mutations of SH2B3/LNK were present in all patients with a majority of responders expressing the exon variant RS3184504 that is associated to coronary artery disease and increased thrombocyte count. Interestingly, human SH2B3 gene expression was not linked to thrombocyte count and circulating CD133 + /CD34 + BM-HSC, whereas correlation was found to LVEF response. It is conceivable, that not reduced gene expression, but mutant related altered SH2B3/LNK protein signaling function impairs control of stem cell proliferation. Therefore, we compared gene expression patterns in a mouse model of CRISPR-Cas9-mediated SH2B3 knockout with human R/NR. Similar to SH2B3/LNK in humans, LNK in the mouse model is an adaptor protein that negatively regulates multiple essential signals, including the SCF/c-KIT system, in stem/progenitor cells. We have previously reported that the deficiency accelerated hindlimb ischemia recovery and bone fracture healing [1,24] is mainly achieved by restoring local blood perfusion with increased angiogenesis and osteogenesis, respectively. Following the series of SH2B3/LNK-related clinical observations in R/NR, we successfully demonstrated for the first time clonal dominance in hematopoiesis, lymphopoiesis, and myelopoiesis by reduced SH2B3/LNK signaling. To test the hypothesis of a clonal network switch found in silico, we subsequently performed a competitive syngeneic bone marrow transplantation model in mice transplanted with unmodified and SH2B3/LNK knockout HSC. SH2B3/LNK knockout HSC clones displayed significant overgrowth of myeloid and immune cells in bone marrow, peripheral blood, and tissue at day 160 after BMT. Moreover, the gene expression profile of peripheral blood was similar to the human R/NR signature (Table 4).
Furthermore, in ischemic myocardium using a SH2B3/LNK À/À mouse MI model, we demonstrated that: 1) SH2B3/LNK deficiency increased the number of HSC/KSL stem/progenitor cells (including EPCs) in BM and stem/progenitor cells in myocardium, 2) Angiogenic growth factor, survival factor, and stem/progenitor chemokine mRNA expressions were up-regulated in SH2B3/LNK deficient BM/PB HSC stem/progenitor cells, 3) Significant mobilization of BM/PB stem/progenitor cells occurred following myocardial ischemia in SH2B3/LNKnegative mice, 4) SH2B3/LNK deficiency reduced myocardial ischemic insult with increased angiogenesis to recruit BM-derived EPCs, 5) Resident stem/precursor cells in the heart proliferated and contributed to tissue regeneration in ischemic myocardium following MI.
Taken together, the obtained transcriptome data reveal that clonal HSC dysregulation led to specific disease phenotypes as reduced angiogenesis and sustained myocardial ischemia were observed to regulate cardiac recovery outcome in PERFECT trial coronary artery disease patients. Moreover, not specifically CHIP-gene mutations, but distinct regulatory angiogenesis and HSC proliferation pathway gene mutations were observed [5]. These observations underline the hypothesis of coronary artery disease (CAD) as a phenotype of hematological stem cell disease and acquired senescence by somatic mutations in the tissue repair gene circuit [4]. From this knowledge, treatment strategies for CAD with stem cells have to be reassessed [9,25]. We have to reconsider the current approach to diagnose and treat hematopoietic stem cell dysfunction as a truly cardiovascular and immune stem cell disease [9]. On this basis, temporary SH2B3 gene downregulation or functional abrogation of LNK protein would be one of the therapeutically effective modifications in currently untreated control animals at week 18 after transplantation. d: White blood cell (WBC) count in Sh2b3 À/À transplanted mice and untreated control animals at week 18 after transplantation. e: Platelet count in Sh2b3 À/À transplanted mice and untreated control animals at week 18 after transplantation. f: Presence of Sh2b3 À/-(dTomato + ) and competitor (eBFP2 + ) cells in the donor cell population in the peripheral blood at week 4, 8, 12, and 18 after transplantation. Week 0 shows the presence of dTomato + and eBFP2 + cells in the transplanted cell population. g-k: Presence of Sh2b3 À/-(dTomato + ) and Sh2b3-intact competitor (eBFP2 + ) cells in the indicated lineage of donor cells in the peripheral blood g:, in the bone marrow h:, in the spleen i:, in Lineage À Sca1 + cKIT + (LSK) HSPCs of the bone marrow j:, and in T cells of the thymus k: at week 18 after transplantation. l: Pearson correlation analysis of RNA-Seq data derived from murine Sh2b3 HSC clonal overgrowth model. The Sh2b3 deficiency (red) is highlighted for an improved visual analysis of important correlations. The color scale, ranging from 1 to -1 in the upper panel (blue to red), represents the correlation between the different factors. The size of the dots represents the significance (p<0 . 01, p<0 . 05, and p>0 . 05, Pearson correlation) of the respective correlation. Transplanted mice: n=8. Control mice: n=7. All graphs represent mean § SD. Statistics: c-e: Unpaired t-test after normality test (D'Agostino & Pearson omnibus normality test) was passed; f-i:, k: Two-way ANOVA. j: Kolmogorov-Smirnov test. Significance level: ** p<0 . 01, *** p<0 . 001, and **** p<0 . 0001 (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.).  a: Machine learning accuracy comparison for the supervised prediction of the patient responsiveness using only preoperative data. Results are obtained after feature selection and subsequent prediction with two independent classifiers. The graph shows the true positive prediction results of two ML models (AdaBoost for feature selection and RF for final prediction for the former study and RF and SVM for the current study).The error bars indicate the respective accuracy standard deviation for the constructed models that have been obtained after 100 iterations. * indicates that the 100 model iterations are significant different according to Bonferroni post-hoc test (p<0.01). b: Receiver Operating Characteristics (ROC) curve for the random forest machine learning model. The plot represents the sensitivity (true positive rate) and the specificity (false positive rate) of the model. The area under the ROC curve (AUC) represents the entire area underneath the ROC curve and the confidence intervals (95%CI) are indicated in blue. c: Venn diagram summarizing the identified SNPs in RvsNR for the analysis and validation cohort.
d-e: Validation for the clustering with primary cohort (n=23, blue, Rostock trial center biomarker cohort) and independent validation cohort (n=14, green, Hannover center). UMAP representation with k=4 and 2,000 epochs (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.). ongoing autologous progenitor cell transplantation therapy [26]. Moreover, modulation of SH2B3 mutant gene expression in proliferative arteriosclerotic or hypoproliferative ischemic cardiovascular diseases may give rise to the next generation cell based therapy [27]. Temporary downregulation of adaptor gene SH2B3 in cKIT-CD117 + / CD133 + /CD34 + HSC can be a therapeutic switch to improve downregulated stem cell response in ischemia, tissue repair, and myocardial infarction. Using the R/NR signature response described here may also guide specific immune checkpoint or interferon based drug treatment interventions.
At this stage it is speculative, but it may be assumed that the myocardial repair response or non-response to stem cell therapy of coronary artery disease patients could be predicted on the basis of the gene expression signatures found in the PERFECT trial patients. This approach may complement the current evaluation in the CardiAMP Heart Failure Trial to predict treatment response from colony forming capacity of bone marrow stem cells [28]. Next phase studies should test the validity of our integrated data analysis approach combining whole transriptome, protein expression, cell function (colony forming units/CFU or Boyden chamber migration assay) with clinical MRI-imaging, laboratory, and symptom data for ML-based clinical outcome prediction accuracy. This may enable treatment of acquired stem cell senescence in a presymptomatic state and aim for restoration of normal bone marrow function. For selection and clonal overgrowth of healthy autologous HSC using ressources like cord blood HSC banking, iPSC-technology of non-mutated cell types as cardiomyocytes for HSC generation can be a desired option. Clonal selection and expansion biotechnology or allogenic BMT can be used to treat early or advanced stages of HSC senescence. In cases of advanced HSC senescence, the positive experience of autologous/allogenic BMT in multiple myeloma patients may be followed [29].
In conclusion, the proposed ML biomarker/gene signature, resulting in 96% classification accuracy, opens a perspective for the analysis of polygenic risk and cardiovascular disease pathomechanism profiling [30]. The first use of an integrative algorithm for gene expression, hub gene coexpression, and transcribed RNA variants derived from RNA-Seq datasets allows identification of patient-specific perturbations. This leads to an individualized pathomechanism switch and targeted treatment as shown for SH2B3/LNK. Moreover, validation of misclassifications can be enabled by whole genome variant analysis as shown here for the X-chromosome linked target gene KLF8. MLbased diagnosis of stem cell based cardiac regeneration capacity in tissue ischemia, infection or vascular repair can be applied and tested for differential diagnosis in heart and degenerative organ disease.

Role of the funding
The funding had no role in study design, in the collection, analysis, interpretation of data, in the writing of the report, and in the decision to submit the manuscript for publication. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.

Declaration of Competing Interest
All authors declare no competing interests.
Contributors G Steinhoff contributed to study design, trial organization, medical controlling, enrolment and clinical follow-up of patients, research plan, analysis of clinical data, analysis of research data, data collection, data control, data analysis, and drafted the manuscript.