Cardiovascular disease biomarkers derived from circulating cell-free DNA methylation

Abstract Acute coronary syndrome (ACS) remains a major cause of worldwide mortality. The syndrome occurs when blood flow to the heart muscle is decreased or blocked, causing muscle tissues to die or malfunction. There are three main types of ACS: Non-ST-elevation myocardial infarction, ST-elevation myocardial infarction, and unstable angina. The treatment depends on the type of ACS, and this is decided by a combination of clinical findings, such as electrocardiogram and plasma biomarkers. Circulating cell-free DNA (ccfDNA) is proposed as an additional marker for ACS since the damaged tissues can release DNA to the bloodstream. We used ccfDNA methylation profiles for differentiating between the ACS types and provided computational tools to repeat similar analysis for other diseases. We leveraged cell type specificity of DNA methylation to deconvolute the ccfDNA cell types of origin and to find methylation-based biomarkers that stratify patients. We identified hundreds of methylation markers associated with ACS types and validated them in an independent cohort. Many such markers were associated with genes involved in cardiovascular conditions and inflammation. ccfDNA methylation showed promise as a non-invasive diagnostic for acute coronary events. These methods are not limited to acute events, and may be used for chronic cardiovascular diseases as well.


INTRODUCTION
Despite a marked mortality reduction driven by improved diagnosis and medical care, ischemic heart disease, a.k.a. acute coronary syndromes (ACS) remains a leading health burden in the 21st century ( 1 ). There is an utmost demand for novel diagnostic tools that refine stratification and monitoring of patient care to improve therapy choice. Chest discomfort is the predominant initial symptom of A CS , but a combination of criteria is r equir ed to achie v e the diagnosis of acute myocardial infarction. First, patients with persistent electrocardiogram (ECG) abnormalities are defined as ST-segment elevation myocardial infarction (STEMI) patients ( 2 ). STEMI is a life-threatening episode accounting for around 30% of ACS cases that can lead to ventricular fibrillation or sudden cardiac arrest and r equir es immediate intervention ( 3 ). Plasma biomarkers such as cardiac troponin I (TnI) and T (TnT), or creatine kinase isoenzyme MB (CK-MB) determine the e xtent of myocar dial injury and are additionally used for diagnosis of non-STsegment ele vation myocar dial infarction (NSTEMI) (4)(5)(6)(7). Recent advances in cardiac-specific-troponin (cTn) assays such as high sensiti v e car diac Troponin (hs-cTn) have increased their sensitivity and diagnostic value ( 8 ). Although this biomarker is highly sensitive, cardiomyocyte-specific and allows estimation of cell stress injury of only cardiac origin, the effects of se v ere hypoxia on other cell types present in the ischemic area remain unknown. Patients with normal ECG and no rise of myocardial injury markers (hs-cTn) and chest pain at rest are defined as unstable angina (U A). U A pa tients have a lower risk of dea th and may benefit from less invasi v e strategies within 72 h ( 9 ) of symptom onset. The causes of UA can be numerous and independent of a thrombotic e v ent, but patients often get the same invasi v e interv ention as NSTEMI. It is currently unkno wn ho w much the diagnosis of UA pr edicts futur e myocardial infarction, but it is widely anticipated to be a potential warning indication. Respecti v e mar kers could help here in the treatment and pre v ention setting of acute myocardial infarctions. It would also be extremely beneficial to identify new non-invasive biomarkers that could improve the stra tifica tion between thrombotic (Type I MI) or nonthrombotic e v ents (Type II MI) leading to different types of myocardial infarction ( 10 ). More accurate and less invasi v e diagnostic methods are needed. CcfDN A recentl y emerged as a new type of biomarker that is associated with many diseases. Nucleosome-sized DNA fragments released from dying apoptotic or necrotic cells make up the ccfDNA, which circulate for a short time in body fluids before they are cleared mainly by the li v er ( 11 , 12 ). Increased concentrations of ccfDNA have been detected in many conditions: several types of cancer (13)(14)(15), acute and chronic systemic inflammations ( 16 ), sepsis ( 17 ), stroke ( 18 ) and myocardial infarction ( 19 ), arising as potential biomarkers for many pathologies. Recently, se v eral studies demonstrated a strong correlation between ccfDNA and cardiovascular disease risk factors and status (19)(20)(21). For e xample, le v els of ccfDNA hav e been associated with se v erity in AMI patients ( 20 ) as well as with cardiometabolic risk factors (19)(20)(21). However, a whole genome methylation by bisulfite sequencing of ccfDNA has not been utilized so far for stra tifica tion of ACS pa tients. The onl y a pproach in this respect was undertaken by focusing on the methylome of human heart chambers, identifying a cluster of cytosines alongside the FAM101A locus. This cardiac-specific methylated region adjacent to the gene FAM101A has been investigated as a possible biomarker for human cardiomyocyte death ( 19 , 22 ). One of those studies found significant differences in the methylation of the six CpGs in this region on ccfDNA from STEMI patients and sepsis ( 19 ). More commonl y, anal yzing ccfDN A fragments by next-generation sequencing (NGS) is used for the detection of different types of cancer signatures by identifying mutations related to cancer types ( 15 ), monitoring transplant rejection by detecting increased le v els of ccfDNA from organ donor ( 23 ) and fetal genetic diseases screening ( 24 ). Besides variant detection, DNA methylation patterns of ccfDNA enable the quantification of cell death in a tissue specific manner, leading to a better understanding of the pathophysiological processes and can serve as biomarkers for diseases ( 25 , 26 ).
In this work, we used whole genome bisulfite sequencing (WGBS) of patient-deri v ed ccfDNA and computational analysis to associate methylation patterns to ACS types: STEMI, NSTEMI and UA. We observed changes in cell type proportions associated with the disease status using methyla tion pa ttern deconvolution. We obtained differentially methylated regions (DMR) from ACS patients compared to healthy controls, as potential biomarkers for ACS pa tient stra tifica tion. We also de v eloped a more costeffecti v e targeted sequencing approach to validate the potential biomarkers in an independent cohort of patients. Furthermore, we de v eloped an R package and reproducible notebooks for other r esear chers to investigate ccfDNA methylation as potential biomarkers for other diseases. In summary, this work shows the utility of ccfDNA methylation markers in cardiovascular disease diagnostics and serves as a blueprint for expansion of ccfDNA methylation markers into other cardiovascular disease areas.

Ethics appro v al and consent to participate
The study has been conducted according to the declaration of Helsinki and was approved by the Berlin State Ethics Committee in Berlin, Germany (EA4 / 122 / 14 and EA1 / 270 / 16).

P atient r ecruitment and sample collection
A total of 29 individuals wer e r ecruited for the first discovery cohort. This group included 8 healthy individuals (control), 8 STEMI patients, 7 NSTEMI patients and 6 UA patients, with an avera ge a ge of 61.5 ranging from 38 to 84. Tw enty-eight individuals w ere men and one healthy control was a woman. For our validation, using a target sequencing approach, we used a cohort of 2 healthy subjects, 4 STEMI, 3 NSTEMI and 2 UA patients (Table S1).
All included patients gave their consent after being fully informed about the study and its purpose.
Fresh blood samples from patients and healthy controls were collected into Na-Citrate tubes (BD), centrifuged at 1200g at room temperature (RT) for 10 min in a swingbucket centrifuge. The supernatant (SN) was transferred to a 15 ml Falcon tube (Greiner) and the tube was spun again at 1200g for 10 min. Double-centrifuged plasma was then transferred into a screw cap plasma collection tube (3 ml Kisker) and immediately frozen at -80 • C until ccfDNA extraction. STEMI pa tients were immedia tely submitted to PCI after confirmed ST-elevation via ECG and parallel troponin measur ement.NSTEMI ar e pa tients tha t show no ST elevation in the ECG and were tested for hsTroponin T using two-time points and showed either a 15-20% troponin rise or a le v el > 52 ng / ml and were submitted to the catheter lab for PCI within a maximum of 72 h. UA patients do not show a rise in troponin and have a largely normal troponin base le v el, but suf fer from chest pain a t r est and wer e also submitted for PCI below 72 h of admission. All subjects gave their written consent after being fully informed about the study and its purpose before the blood was drawn. Patient characteristics , including age , disease status , concentration and the total amount of ccfDNA obtained, and cardiac biomarkers are described in Table S1.

Measurement of clinical parameters
All clinical parameters and biomarkers wer e measur ed directly in plasma or serum by Labor Berlin using standardized assays and pr ocedures r outinely used in diagnostics for Charite hospital.

ACS classification by classical biomarkers
To classify disease se v erity based on classical biomarkers, m ultinomial lo gistic r egr ession models ar e used via the multinom () function in nnet package. We have created single variable multinomial logistic r egr ession models to predict ACS type: STEMI, NSTEMI and UA. In each model, we used one of the ccfDNA le v els in b lood or classical clinical biomarkers as the predictor variable. Patients which did not have data available for certain markers (NA values) were excluded from the corresponding model. The misclassification rate is calculated as the percentage of misclassified samples output by the model.

ccfDN A e xtraction
ccfDNA was extracted from freshly thawed double centrifuged (1200g at RT) citrate-plasma using the Qiagen QI-Aamp Circulating Nucleic Acid Kit (Cat. #55114) and collected using ultrapure nuclease-free water in a total volume of 50 l and quantified using a Qubit fluorometer 2.0 and the Qubit dsDNA HS Assay kit (Cat. # Q32854). Fragment size was confirmed and validated using Agilent Tape Station 2200 and the ccfDN A screenta pe or the hsD1000 screentape. Total amount of cfDNA was then normalized to the amount of plasma it was extracted from.

Whole-genome bisulfite sequencing (WGBS)
Isolated ccfDNA was sent to Novogene and rechecked for quality and concentration by the company. Bisulfite conversion and sequencing was performed using their low-input BS-seq (PBAT) protocol. Data was analyzed using previously in-house programmed R software packages ( 27 ).

Targeted methylation sequencing
Input amount and pulldown were optimized using sheared genomic DNA (ZymoResearch Cat. # D5014). A total of 10 ng of isolated cell-free DNA per sample, quantified and checked for quality, was used as starting material. For enzymatic conversion as an alternati v e to bisulfite conversion, we applied the NEBNext Enzymatic Methylseq Module (Cat. #E7120S) together with the Nonacus Cell3TMTarget: Libr ary Prepar ation kit. Up to 8 libraries of individuall y ada pter-tagged samples were pooled to a total amount of 1 g for probe hybridization and capture enrichment. Ca ptured library DN A was quantified and quality checked using Qubit Fluorometer and qPCR as well Agilent 2200 TapeStation with High Sensitivity D1000 reagents and screentape. Samples were loaded onto a SP flow cell and sequenced using the Novaseq 6000 platform acquiring 400 million reads.

Cell type / tissue deconvolution
The deconvR R package was created for the omics-based deconvolution of cfDNA to cell types of origin. The deconvolute function within deconvR ( https://github.com/ BIMSBbioinf o/decon vR ) offers f our choices of models, non-negati v e least squares (NNLS, nnls package v.1.4), support vector r egr ession (SVR, e1071 package v.1.7.4), quadr atic progr amming (QP, quadprog package v.1.5.8), and robust linear r egr ession (RLM, MASS package). The best model for methylation-based deconvolution was determined as follows. Using the simulateCellMix from the deconvR package, we simulated a dataset containing 1000 mixed samples based on the r efer ence atlas from Moss et al. ( 26 ) with duplicate CpGs removed (25 tissues / cell types and 6105 CpGs). This simulated dataset was deconvoluted using all four different models and the predictions were compared to the actual cell type proportions, to calculate the accuracies and weaknesses of each model. The lowest root-meansquare error (RMSE) value was obtained with the NNLS model, followed by the RLM model.
The comprehensi v e array-based human cell type methyla tion a tlas as genera ted by Moss et al. ( 26 ) was extended by adding three more heart tissues to the full reference methyla tion a tlas (25 tissues / cell types and ∼390K CpGs) and performing tissue-specific CpG feature selection as f ollows. EPIC Methylation-arra y data f or the right atrium auricular region ( n = 2, ENCSR517JQA and ENCSR280LMY), heart left ventricle ( n = 2, ENCSR515ZCU and ENCSR190PQG) and the coronary artery ( n = 2, ENCSR688OHW and ENCSR582BMR) wer e acquir ed from the ENCODE portal ( 31 ). The methyla tion-array ida t files wer e r enamed to r especti v e r ed / gr een channels and then pre-processed using the script provided via the meth atlas GitHub repository ( https:// github.com/nloyfer/meth atlas ) to normalize the data using an arbitrary r efer ence sample, filter b y P -v alue, sex chromosomes and bead number, and finally remove SNPs and non-CpG sites. The pre-processed ENCODE methylation data was merged with the full r efer ence atlas based on CpG probe ID to construct the raw extended methylation atlas (28 tissues / cell types and ∼390K CpGs). Then CpGs with missing values or sites where the row-wise variance was < 0.1% were omitted and replicates were pooled per tissue.
For the selection of tissue-specific CpGs from our extended atlas two approaches were followed. First, the top 100 most h ypometh ylated and h ypermeth ylated CpGs for each cell type were selected as described by Moss et al. ( 26 ) with the minor adjustment of defining a list of already used CpGs in any tissue, to select another CpG for a tissue if the top hits are already in use. Briefly, the methylation atlas was scaled by dividing each row of the atlas by the summed methylation values of the respecti v e row, then for each cell type, the top 100 h ypermeth ylated CpGs with the highest scaled methylation values were selected and recorded to prevent repeated selection of the same CpGs. This procedure was repeated for the re v ersed scaled methyla tion ma trix to identify the top 100 h ypometh ylated CpGs per cell type. Secondly, using the unscaled extended atlas, the dmpFinder function of the minfi R package ( 32 ) was used to identify for each cell type the 200 most differential CpGs compared to any other tissue.
The sets of tissue-specific CpGs were joined and based on those, neighboring CpGs within a distance of 50 bp and pairwise-specific CpGs were added as explained in Moss et al. ( 26 ) To deconvolve the WGBS ccfDNA samples into cell types of origin, genomic loci were first mapped to CpG probe IDs using the BSmeth2Probe function in deconvR. The genomic locations of CpG probe IDs were specified according to the Illumina Infinium MethylationEPIC v1.0 B5 Manifest File. The cell type of origin proportions per sample were estimated using the deconvolute function of deconvR employing the NNLS model and the extended cell type specific CpG methylation signature matrix (28 Cell types) as r efer ence.

Differential methylation
The dif ferential methyla tion was called on tiled regions (500 bp sliding windows with a step size equal to 500 bp) using the R package methylKit ( 27 ) (version 1.16.0), q -value cutoff 0.01 via SLIM method ( 33 ) and minimal methylation difference of 25%.

DMR annotation
The DMRs were annotated by Genomic Regions Enrichment of Annotations Tool (GREAT) ( 34 ) using the rGREAT package (release 3.12). The model used was the 'Basal plus extension', annotating genes on proximal regions (5 kb upstream, 1 kilobase downstream) plus distal (up to 1000 kb). We also annotate the DMRs for CpG r egions, genomic r egions and r egulatory r egions using the AnnotateR package ( 35 ) and additionally we annotate enhancer regions using the chromHMM bed files from the Roadmap Epigenomics Project ( 36 ) (Core 15-state model).
The list of gene symbols obtained was submitted to DisGeNET (v 7.0), a comprehensi v e pla tform integra ting information on human disease-associated genes and variants ( 37 ), using its R package disgenet2r (v.0.0.9, database = all). The gene symbols obtained from GREAT were converted to Entrez gene ids and used for DisGeNET enrichment analysis (enrichDGN) with R package DOSE (v 3.12) with parameters P -value cutoff < 0.05, adjusted P -value < 0.2, minGSSize = 2, maxGSSize = 500, pAd-justMethdod = BH, background genes = whole human genome). We looked for heart-related disease by searching terms from the disease name and disease class name from DisGeNET, assigning to each DMR the annotation obtained.

Linear models of the methylation levels on DMRs
In order to obtain ACS type associated DMRs, we defined the ACS type as a numeric vector of severity (UA = 1, NSTEMI = 2, STEMI = 3). This follows the classical clinical understanding of ACS where UA is the least se v ere and STEMI is the most se v ere type of ACS. We have used linear r egr ession models using R lm () function, we have constructed following models, where Y is either disease se v erity as explained above or the clinical biomarkers: We have constructed such a model for each DMR. This way we associated DMRs with disease se v erity by testing r egr ession coefficients by t-test, and retained DMRs that are significantly associated with disease se v erity. Similarly, we have also associated DMRs with clinical markers simply building models for each clinical marker where the response variable Y is the clinical marker. In addition, in order to test if DNA methylation le v els provide important value for prediction of disease se v erity ov er ccfDNA le v el we hav e also built a reduced model for each DMR w here onl y ccfDN A le v els are used as a pr edictor variable. We compar ed these reduced models with the full models using ANOVA test. This way we would be able to control for ccfDNA le v els and their contribution to model fit.

DMR validation analysis with targeted sequencing
After the detection of a systematic bias of methylation percentages in the targeted sequencing data (see Supplementary Figure S5), log transformation using pseudo-counts (log((observed + 1) / ( (100 -observed) + 1)) and quantile normalization using the R package's 'pr eprocessCor e' (version 1.58.0) function 'normalize .quantiles.use .target ()' were applied. In addition, ComBat adjustment from the R package 'sva' (version 3.44.0) using the function 'Com-Bat ()' with subsequent back transformation, served to normalize the data. Mean differences (condition -healthy control percentage of methylation) were computed per DMR and compared to the discovery cohort's values. The Pearson correlation coefficient was computed using the R package's 'ggpubr' (version 0.4.0) function 'stat cor ()'. The PCA on DMRs which were associated with disease groups was generated using the R 'stats' (version 4.2.1) 'prcomp ()' function.

Machine learning models for prediction of ACS type
In order to predict the ACS type from ccfDNA methylation profiles, we trained thr ee pr edicti v e models (random forest (RF), partial least squares regression (PLS), penalized multinomial r egr ession (PMR)) on a 70 / 30 train-test-split of the discovery data. We have used PLS and r andomFor ests packages via the caret package to build and test the models. We used the methylation values from 193 DMRs shared between discovery and validation as features. The features wer e pr e-processed by centering and scaling. We have used Cross-Validation (10 fold, repeated 10 times) for model assessment. We then applied these models on the batch normalized validation cohort.

ccfDNA levels and classical clinical biomarkers are predictive of ACS status but can not accurately differentiate ACS types
For diagnosis and patient characterization, we measured classical biomar ker le v els of the study participants (Tab le S1) next to ECG and the patients were assigned to the ACS groups by a clinician according to ESC guidelines (Collet J. P., ESC guidelines, EHJ2021). High-sensiti v e car diac Tro-poninT (hs-cTnT) was mostly undetected in healthy controls and had low le v els in UA patients, and highest values in STEMI and NSTEMI (Supplementary Figure S1A). As expected, the left ventricular ejection fraction (LVEF) was higher in healthy controls (ranging from 64% to 74%) and lower in ACS patients, with some patients from the UA group showing very low values (minimal value of 30%, Supplementary figure S1B). For creatine kinases (CK) we see an increase of the le v els with increasing se v erity of ACS (STEMI exhibits the highest median, followed by NSTEMI and UA medians) (Supplementary Figure S1C-F).
The total amount of ccfDNA extracted per mL of plasma from the 29 discovery samples ranged from 2.68 ng (healthy control) to 60.65 ng (STEMI) (Figure 1 A, Table S1). All ACS groups showed higher le v els of ccfDN A w hen compared with the healthy control group (Wilco x on ranksum test P -values 0.0001 for STEMI, 0.0003 for NSTEMI and 0.0006 for UA), confirming previous findings where ccfDNA le v els were raised on diseased states. Howe v er, total ccfDNA was not statistically different between ACS types, pre v enting the distinction of ACS types solely by ccfDNA le v els. This finding shows that ccfDNA quantity by itself cannot be a very specific marker for ACS stratification, which is also evident when we tried to classify patient samples for the ACS type using ccfDNA le v els in a multinomial logistic r egr ession (Figur e 1 B). In addition, using clinical biomarkers in the same fashion as predictor variables to m ultinomial lo gistic r egr ession models also do not yield highly predicti v e models to distinguish ACS types (misclassifica tion ra te is between 20% and 60% for dif ferent models with different predictor variables, Figure 1 B).
In addition, we checked the Pearson correlation between all the classic biomarkers and the concentration of ccfDNA (for ACS patients only). The classical biomarkers are an important part of the diagnosis although ccfDNA concentration in plasma was inversely correlated with left ventricular ejection fraction (LVEF) ( R 2 = 0.56) and unexpectedly, weakl y inversel y correlated to troponin. On the other hand, ccfDN A was weakl y positi v ely correlated to CK and CKmax and positi v ely correlated to CK-MB ( R 2 = 0.42) (Figure 1 C). These findings show that traditional biomarkers and ccfDNA le v els hav e a comple x relationship. Although ccfDNA le v els show correlation to multiple classical mark-ers, just the ccfDNA le v el cannot account for the variation observed in other markers in this cohort of patients.

Cell type proportions obtained via deconvolution of ccfDNA methylation are associated with ACS
Since ACS causes reduced blood flow to the heart and is associated with inflammatory processes, we expect that the cell types that gi v e rise to ccfDNA might be disease specific. We used methylation-based cell type deconvolution to quantify the cell type composition that gave rise to patient-deri v ed ccfDNA. To this end, we applied WGBS on ccfDNA samples obtained from the ACS patients and healthy controls. Our quality control analysis showed that we had a satisfactory bisulfite (BS) conversion r ate, r anging from 97.92% to 99.63% and the average CpG read coverage ranged from 4.2 sequenced reads to 9.1 (Supplementary Figure S2). Briefly, our deconvolution method le v erages cell type specific methylation patterns in the shape of a signa ture ma trix. The deconvolution algorithm makes use of those patterns to find optimal proportions of cell types that might gi v e rise to the observ ed bulk DNA methylation pattern. We used a cell type specific CpG methylation signa ture ma trix built from a comprehensi v e methyla tion a tlas ( 26 ) extended with additional heart tissue samples (see Methods for details). The deconvolution algorithm outputs the percentage of cell cell types observed in plasma rather than ccfDNA amount attributed to cell types (the ccfDNA per cell type can be found in Supplementary Figure S6 and Table S3). Percentage of cell types in plasma provides a normalized quantity that is not influenced by ccfDNA amount per patient which is variable but uniformly high compared to healthy samples. Overall, ccfDNA associated with blood cells was the most abundant in all samples, with neutrophils presenting the most abundant cells, ranging from 21% (in the healthy control sample) to 61% (in NSTEMI sample). Pr eviously, granulocytes wer e found to be the most abundant blood cells in healthy donors (average 32% of the cell composition) ( 26 ), matching our results for neutrophils, the most abundant type of granulocytes, with an average of 28% for the healthy group. In our data, in all ACS groups, neutrophils were elevated when compared with the healthy contr ol gr oup. The neutr ophils ar e known to infiltrate infar cted areas in the first hours after ischemia and they play a major role in the subsequent acute inflammation ( 38 ). We also could detect increased CD4 + T cells (in NSTEMI and UA). On the other hand, ccfDNA from monocytes was decreased in STEMI and proportions of natural killer (NK) cells, erythr ocyte pr ogenitors and hepatocytes wer e r educed in all ACS types (Figure 2 ). We also note that increased ccfDNA is associated with kidney tissue in all ACS types and specifically in STEMI patients. Acute kidney injury is known to be associated with ACS ( 39 ). Furthermor e, compar ed to healthy donors, ACS patients showed an increase in vascular endothelial cells, heart left ventricle and coronary arteries in ACS type specific manner (Figure 2 ), which indicates there might be ACS type specific tissue damage specific for respecti v e ACS types.

Identifying differentially methylated regions as biomarkers specific to ACS type
In order to further investigate disease-specific ccfDNA methylation markers, we conducted a tiled differential methylation analysis using logistic r egr ession based statistical testing as implemented in the methylkit R package. We compared ACS types to healthy subjects in a pairwise manner. Using a threshold of minimal 25% difference on methylation between the ACS type and healthy subjects and a qvalue maximum of 0.01, we identified a total of 688 DMRs in STEMI patients, 388 in NSTEMI patients and 865 in UA. Of those, 486 w ere STEMI specific, 223 w ere NSTEMI specific and 684 UA specific. Figure 3 A illustrates how the DMRs are shared across the three groups. Those disease specific DMRs are relevant as candidate biomarkers to classify patients in ACS types. Using all identified DMRs, we generated a principal component analysis (PCA). The results show a clear separation of healthy patients from the ACS groups (Supplementary Figure S3). The UA group is also separated from STEMI and NSTEMI, while the latter two groups are not clearly separated from each other. Howe v er, the ability to differentiate UA and healthy patients from NSTEMI and STEMI is already of great use, as the ECG is used as a first stratification of STEMI and NSTEMI. In addition, further filtering of DMRs to strictly disease specific biomarkers are possible as we have shown in the following sections.
Next, we wanted to check if genomics features were associated with the obtained DMRs and if they were associated with disease related genes. For this purpose, we assigned genes that are likely to be regulated by those DMRs using the r egulatory-r egion association tool GREAT ( 34 ). Supplementary Figure S4 shows the number of genes per region for each group, and also the distance from DMRs to the transcription start sites (TSS). Additionally, we annotated DMRs with CpG island-associated featur es, genomic featur es (such as intron / exon, UTRs etc.), Epigenomics Roadmap enhancers and long non-coding RN As (lncRN A). Most DMRs were in the open sea, out of CpG islands (CpG inter) (564 for STEMI, 329 for NSTEMI and 765 for UA) (Figure 3 B). The genomic annotations show most DMRs on introns (around 60% of all DMRs, Figure 3 B) and a large part of it is annotated as enhancers (608 for STEMI, 323 for NSTEMI and 589 for UA) and lncRNAs (340 for STEMI, 230 for NSTEMI and 230 for UA) (Figure 3 B). This points out that differ ences ar e possib ly dri v en by cell type specific r egulatory r egions.
Following this, we also performed a gene set enrichment test (using the curated canonical binomial test) for the genes associated with DMRs. Here we found pathways involved in the regulation of immune response, hemostasis and phagocytosis enriched for the three ACS groups (Figure 4 A). For STEMI, the most significantly enriched pathw ay w as 'Genes in volved in hemostasis' f ollowed by 'Leukocyte transendothelial migration' (adjusted P -values 0.0003 and 0.0004, respecti v ely). The hemostasis gene set contained genes involved in coagulation, a process known to be triggered during ACS e v ents ( 40 ). For NSTEMI, we could not detect enriched pathways using the conservati v e threshold adjusted p-value of 0.05. The lowest adjusted pvalue was 0.12 (unadjusted p-value 0.003, Regulation of p38-alpha and p38-beta). For UA, the most enriched pathways were 'Genes involved in Regulation of IFNA signaling' and 'Fc gamma R-mediated phagocytosis' (both adjusted p-values 0.0001). Fc ␥ receptors (Fc ␥ R) are plasma membrane-associated receptors for IgG and the pentraxins C-reacti v e protein (CRP), a known risk factor for cardiovascular diseases, and its role on inflammation in cardiovascular disorders have been already investigated ( 41 ).
Using the DisGeNET database for gene-disease associations ( 37 ), we assigned the DMR-related genes to diseases. In total, 74.13% of DMRs from STEMI, 72.68% from NSTEMI and 61.62% from UA were annotated with heart-related diseases. The high proportion of DMRs regulating genes involved in heart-related diseases provided us with additional evidence that those DMRs are promising biomarker candida tes. La ter, we ran a Dis-GeNET enrichment analysis to check if the DMRs were statistically enriched ( q -value < 0.05) for ACS related diseases. We found 505 diseases enriched for STEMI, 253 for NSTEMI and 326 for UA. Filtering by cardiovascular and inflamma tory-rela ted diseases, 18 diseases were enriched for STEMI and the most significant was 'Cardiomegaly' ( q -value 0.00038). For NSTEMI, 14 cardiovascular / inflammatory diseases were enriched, with 'Inflammation' being the most enriched ( q -value 0.01). For UA, only six cardiovascular / inflammatory diseases were enriched with 'Congenital Heart Defects' presenting the most enriched ( q -value 0.007). Figure 4 B shows all the cardiovascular / inflammatory diseases enriched in at least one group, with an adjusted P -value cutoff of 0.05.

DMRs are associated with ACS types independently of ccfDNA levels in plasma
Since the classic clinical biomarkers cannot be effectively used to stratify ACS patients in this cohort (Figure 1 B & Supplementary Figure S1), we decided to further investigate the potential of using DMRs for stra tifica tion. W hile the unique 1637 discovered DMRs were all potential markers for separating each ACS gr oup fr om healthy individuals (because the differential methylation analysis was done in comparison to the control), they are not necessarily useful for discriminating ACS types. Ther efor e, we decided to further narrow down DMRs to a subset that are strongly associated with the disease. For this purpose, we used linear models predicting numeric disease se v erity encoding of ACS types or clinical biomarkers from methylation values of DMRs across samples using ccfDNA amount as a covariate (see methods). The disease se v erity encoding simply shows UA as the least se v ere and STEMI as most se v ere w hich is generall y accepted ( 42 ). We have fit such models for each DMR to distinguish DMRs that are associated with disease and clinical biomarkers. We found 254 (15.51%) DMRs significantly associated ( P -value ≤ 0.05) with disease se v erity, showing that those DMRs were good candidates for disease stratification The details of those DMRs can be found in Table S4 and S5). From those, 158 DMRs were annotated as involved in cardiovascular diseases or inflammation using DisGeNET ( 37 ) and 163 overlapped with enhancers from the Roadmap epigenomics project ( 36 ) . Figure 5 A shows the percentage of DMRs significantly associated with clinical biomarkers and ACS type (shown as 'disease') encoded as disease se v erity. Throughout this analysis we controlled for ccfDNA le v els in the linear model by adding it as a covariate. As expected, in almost all the linear models ccfDNA le v els did not hav e a significant association with ACS types encoded as disease se v erity.
Following this, we performed a PCA on the methylation le v els of the 254 DMRs which were significantly associated with the disease group (Figure 5 B). We observed that this narrower list of DMRs achie v ed complete separation of ACS types and healthy samples. In Figure 5 C, we can see the top fiv e DMRs w hich are most significantl y associated with disease se v erity. These analyses suggest that DMRs can be used as biomarkers that can distinguish ACS versus healthy samples as well as different ACS types.

Disease-specific DMRs are validated on an independent patient cohort
In order to validate the discriminati v e DMRs, we de v eloped a targeted sequencing approach and applied it on an independent cohort of patients. We used 2 healthy subjects, 4 STEMI, 3 NSTEMI and 2 UA patients as the validation cohort. The participants' characteristics can be found in Table S1. Overall, we targeted 18831 CpGs that are either involved in disease-specific DMRs or part of the deconvolution signature matrix, and 75% of those targeted CpGs were covered with at least 5 reads for each sample.
We ran a dif ferential methyla tion analysis with the validation samples restricting it to the regions of discovery cohort DMRs (using a q-value cutoff < 0.01). We found that 570 of 615 DMRs for STEMI, 308 of 338 for NSTEMI and 252 of 636 for UA were also differentially methylated on the validation samples (DMR subset characterisation in Supplementary Table S2). When also taking directionality of methylation difference into account to ensure that hypoor h ypermeth yla tion of the valida tion cohort compared to the controls was present with the same effect as in the discovery cohort, the numbers decreased to 566, 306 and 248 DMRs respecti v ely. Howe v er, the ef fect size (methyla tion difference) is in general lower in the validation cohort. As from the 1637 unique discovery cohort DMRs only 1314 unique validation DMRs were available due to sequencing covera ge, percenta ges in the following refer to this subset.
After a ppl ying an absolute 25% cutoff to differential methylation and by keeping the q -value cutoff < 0.01, 171 STEMI, 115 NSTEMI and 82 UA DMRs remained significant. Quantile normalization and ComBat adjustment of methylation per centages ensur ed comparability between sequencing techniques and showed the validation of 23% unique DMRs for the validation cohort. These DMRs satisfy q-value and the stringent methyla tion dif ference cut-offs in the validation cohort as well (Figure 6 A). Howe v er, a larger proportion of discovery cohort DMRs (STEMI: 92%; NSTEMI: 91%; UA: 39%) are significantly differentially methylated in the validation cohort when the stringent methyla tion cutof f is not a pplied and onl y the q -value cutoff is consider ed (Figur e 6 A). In addition, a strong linear correla tion of methyla tion dif ference values between discovery and validation cohorts (NSTEMI: R = 0.98, P < 2.2e-16; STEMI: R = 0.99, P < 2.2e-16; UA: R = 0.98, P < 2.2e-16) was evident. When using the 254 DMRs from the discovery cohort which were identified to distinguish best between ACS types, we could again observe good separation of the different ACS types and healthy subjects via PCA (Figure  6 B) regardless of the different sequencing techniques used in validation and discovery cohorts. Only one STEMI sample from the validation cohort was evident to present as an outlier. In addition, we built multi variate predicti v e models using multiple machine learning methods (random forests: RF, penalized multinomial r egr ession: PMR and partial least squar es r egr ession: PLS, see methods for details). In all cases, the models were trained on the discovery cohort and tested on the validation cohort. We have achieved almost perfect accuracy across models (RF: 0.8182, PLS: 1, PMR: 1, see also Supplementary Figure S7 for cross-validation results). The loss in accuracy in the RF model is due to a single STEMI sample that clusters with the NSTEMI group as can already be seen in Figure 6 B.
Howe v er, most of the studies investigated the increase in the total amount of ccfDNA, and the correlation of ccfDNA le v els with the se v erity of the diseases. While useful, such an approach is limited because it cannot discern which tissue is being damaged, nor can it measure the le v el of injury. The ccfDNA fragments contain DNA methylation patterns which enable their unique assignment to the cell type where the fragment origina ted. ccfDNA methyla tion profiling has been used to determine the cellular contributors of the damaged tissue, for example, in islet transplantation and sepsis ( 26 ).  In the present study, we present a proof of principle approach, by using WGBS to de novo investigate differential methylation profiles of ccfDNA in three different ACS types and a healthy contr ol gr oup. We analyzed both the ccfDNA methylation-based cell composition and the changes of methylation in response to the disease state. Additionally, we used a validation cohort to validate the discovered DMRs, using a targeted sequencing approach. The targeted sequencing a pproach greatl y reduces the profiling costs, increases the speed and makes the method amenable for possible clinical usage.
As previously shown, we found that all ACS groups had increased le v els of ccfDN A, w hen compared to the healthy contr ol gr oup (Figur e 1 ). Inter estingly, both MI groups showed slightly higher amounts of ccfDNA than UA. The difference was howe v er not statistically significant, possibly due to the lack of sta tistical power. A dif ferent stud y reported similar, linearly increasing le v els of ccfDNA from UA, NSTEMI and STEMI ( 43 ). It is important to state that the ccfDNA fragments can originate from different cell types, and not only cardiomyocytes.
In fact, our cell composition analysis showed an increased proportion of neutrophils in A CS , a cell type known to be involved in the immune response to MI ( 38 ). Neutrophils are the first cell type recruited to the site of injury and contribute different functions in cardiovascular diseases ( 44 ). During A CS , neutrophils release extracellular traps. Neutrophil extracellular traps (NETs) are secreted structures formed by decondensed chromatin, histones and neutrophil granular proteins, which have been proposed to contribute to ccfDNA ( 45 ). These structures have prothrombotic activity and their increased le v els ar e r elated to larger infarct size and major cardiovascular events ( 44 , 46 ).
On the other hand, we observed a decrease in the proportions of erythrocyte progenitors in the ACS groups. Contrary to the other cell types, ccfDNA from those cells does not reflect cell death but rather the process of erythrocyte ma tura tion, when the progenitors lose their nuclei ( 26 , 47 ). The decreased proportion of erythrocyte progenitors might reflect a shift in the hematopoietic process, increasing the production / ma tura tion of immune cells while decreasing erythrocyte ma tura tion. Sim ultaneousl y, we also observed an increase of CD4 + T-cells, especially in NSTEMI and UA.
Cumulati v e e vidence from animal models showed a double role of CD4 + T-cells in ischemia-reperfusion injury and tissue recovery ( 48 ). We did not detect ccfDNA from the heart's left and right atriums in any of the ACS groups. These r esults ar e consistent with the rarity of the atrial infarction, due to the lo wer o xygen demand of the atrium, when compared to the ventricle. We detected on average 1% of cells from the heart left ventricle in healthy controls, with small increases in A CS , where NSTEMI was on average the most prominent (howe v er the maximum value was observed in a STEMI patient).
Our dif ferential methyla tion analysis found a total of 1637 DMRs when comparing each of the ACS groups with the healthy controls ( Figure 3 ). Those DMRs clearly separated the ACS groups from healthy controls. These DMRs are mostly located on introns and enhancer regions. The genes they are associated with are clearly associated with coronary artery disease and inflammation (See Figure 4 B). The pa thways associa ted with the same genes indicate immune system related pathways such as 'Leukocyte Migration' and 'IFN signaling ' (Figure 4 A). In addition, one of the most significantly enriched pathwa ys f or the STEMI DMR set was 'Genes involved in hemostasis'. The hemostasis gene set contained genes involved in coagulation, a process known to be triggered during ACS e v ents. In summary, our DMR set can be associated with disease relevant genes and processes.
Howe v er, the initial DMR set lacked the power to accur ately separ ate the STEMI from NSTEMI patients (Supplementary Figure S3). In order to distill the initial DMR set, we have employed linear modeling to find DMRs where the methylation le v els are significantly changed between the different disease groups. This procedur e r esulted in a set of 254 DMRs, and significantly improved the disease patient stra tifica tion (Figur e 5 B). The r eduction in the number of DMRs which are necessary for accurate patient stratification, reduces the sample preparation costs and implies a possible clinical a pplication. Interestingl y, 96 of those DMRs are currently not known to be involved in cardiometabolic diseases or inflammation.
In order to validate our findings, we used a targeted sequencing approach. We managed to cover ∼75% of the targeted regions, which was expected due to the technical limitations of probe design. We are able to validate ∼23% unique DMRs using stringent cutof fs. W hile investiga ting the DMRs, we observed that the majority of ACS related DMRs are located in intr onic regions. The intr onic DNA methylation e v ents wer e also pr eviously associated with the etiology of dila ted cardiomyopa thy ( 49 ) and coronary artery disease ( 50 ).
It is of utmost importance to mention that the methyla tion pa ttern of the released ccfDNA goes beyond the cell death of cardiomyocytes. It highlights that multiple cell types contribute to ccfDNA, through different biological processes in a disease like acute coronary syndr ome. Thr ough the ccfDNA methylation signatur es, we ar e measuring information about a whole spectrum of cellular functions related to acute heart disease, with the final goal of improving both the patient stratification and longterm outcome prediction. Usage of ccfDNA methylation as biomarkers is, however, not limited to acute events, but could be used for measuring disease progression in chronic conditions, such as coronary artery e v ents and heart failure. This utility merits the expansion of ccfDNA methylation biomarkers into larger clinical investigation efforts for cardiovascular diseases.

CONCLUSIONS
Here, we show the potential of using a set of ccfDNA DMRs for ACS patients' stratification without undermining the reliability in identifying of STEMI and NSTEMI via ST elevation through ECG and the standardized guidelines of the gold standard of hsTnT measurements. By using 254 DMRs w e w ere able to separate the groups without any other car diac biomar ker without needing any additional clinical finding or biomarker. We were able to a ppl y a targeted sequencing approach using our identified DMRs in order to validate our findings in the second cohort of patients in a more cost-effecti v e way. We also publish an R package and reproducible code base to carry out similar analysis for different diseases. In the future, we plan to expand the cohort types and sizes and to improve the targeted sequencing method to make it useful as a non-invasi v e tool in clinical settings.

DA T A A V AILABILITY
For reasons of reproducibility scripts are provided on https: //github.com/BIMSBbioinfo/ccfDNA ACSS manuscript with a comprehensi v e Readme.md file on the scripts' content (permanent DOI: https://doi.org/10.5281/zenodo. 8009561 ). Links for public access to methylKit objects (tabix f ormat) f or discovery and validation cohort are provided at the end of the Readme.md file on the github repository link ed abo ve. Sequence data has been deposited at the European Genome-phenome Archi v e (EGA), which is hosted by the EBI and the CRG, under accession number EGAS00001007263.