Identification of a Multiplex Biomarker Panel for Hypertrophic Cardiomyopathy using Quantitative Proteomics and Machine Learning

Hypertrophic cardiomyopathy (HCM) is defined by pathological left ventricular hypertrophy (LVH). It is the commonest inherited cardiac condition and a significant number of high risk cases still go undetected until a sudden cardiac death (SCD) event. Plasma biomarkers do not currently feature in the assessment of HCM disease progression, which is tracked by serial imaging, or in SCD risk stratification which is based on imaging parameters and patient/family history. There is a need for new HCM plasma biomarkers to refine disease monitoring and improve patient risk stratification. To identify new plasma biomarkers for patients with HCM, we performed exploratory myocardial and plasma proteomics screens and subsequently developed a multiplexed targeted liquid chromatography-tandem/mass spectrometry-based assay to validate the 26 peptide biomarkers that were identified. The association of discovered biomarkers with clinical phenotypes was prospectively tested in plasma from 110 HCM patients with LVH (LVH+ HCM), 97 controls and 16 HCM sarcomere gene mutation carriers before the development of LVH (subclinical HCM). Six peptides (Aldolase Fructose-Bisphosphate A, Complement C3, Glutathione S-Transferase Omega 1, Ras Suppressor Protein 1, Talin 1, and Thrombospondin 1) were increased significantly in the plasma of LVH+ HCM compared to controls and correlated with imaging markers of phenotype severity: LV wall thickness, mass and % myocardial scar on cardiovascular magnetic resonance imaging. Using supervised machine learning, this six-biomarker panel differentiated between LVH+ HCM and controls, with an area under the curve of ≥ 0.87. Five of these peptides were also significantly increased in subclinical HCM compared to controls. In LVH+ HCM, the 6-marker panel correlated with the presence of non-sustained ventricular tachycardia and the estimated 5-year risk of sudden cardiac death. Using quantitative proteomic approaches, we have discovered six potentially useful circulating plasma biomarkers related to myocardial substrate changes in HCM, which correlate with the estimated year clinical patients HCM, such as cardiac arrhythmias, heart failure and SCD, to myocardial substrate changes myoarchitectural disarray, fibrosis and small vessel disease. We found that five of the biomarkers identified in LVH+ HCM patients were also elevated in the plasma of a limited number of patients with subclinical HCM where, in spite of the absence of LVH by standard imaging methods, myocardial substrate changes are already presumed to exist 41) . There is a growing body of evidence to suggest that such myocardial substrate changes in subclinical HCM may be prognostically relevant. For example, in a post-mortem histopathological study of subclinical HCM, the hearts of four related SCD victims with apparently normal LV mass and wall thickness, demonstrated widespread myoarchitectural disarray and a pathogenic HCM-causing sarcomere gene mutation was later implicated. In another UK regional post-mortem registry, nine hearts from athletes who had succumbed to SCD, again showing normal wall thickness and mass, were discovered to have myoarchitectural disarray consistent with HCM These preliminary biomarker findings described for subclinical HCM suggest that the plasma proteome of individuals with subclinical HCM merits further exploration at scale, to better understand its potential clinical utility in terms of tracking pathophysiological myocardial substrate changes ahead of manifest LVH.


INTRODUCTION
Hypertrophic cardiomyopathy (HCM) is a common myocardial disorder characterized by left ventricular hypertrophy (LVH) caused predominantly by mutations in cardiac sarcomere protein genes (1,2). Disease manifestations, including symptoms, are highly variable amongst patients with HCM and occasionally first presentation is with a major adverse sudden cardiac event, including death. Although the genetic architecture of the disease has been substantially resolved, the impact of this knowledge on therapy has been limited largely due to a lack of understanding about the determinants of disease progression. In HCM, genetic testing is used to identify known pathogenic disease-causing mutations, whilst ECG and cardiac imaging tests are used to elaborate the overall phenotype and monitor disease progression. New biomarkers are needed to better guide the intensity of imaging surveillance, refine current risk stratification algorithms, and track disease progression so the impact of existing and novel therapies can be assessed.
Global proteome studies have provided mechanistic insights into many cardiovascular diseases, but there are few such studies in human cardiomyopathies. Recently, we examined myocardial tissue removed from patients with HCM at the time of cardiac surgery using an unbiased label-free proteomic characterisation of myocardium and demonstrated that the tissue proteome of the disease is characterised by dysregulation of metabolic and structural proteins (3).
The aim of the present study was to combine, into one targeted multiple reaction monitoring (MRM) liquid chromatography mass spectrometry-based (LC-MS/MS) assay, the peptides identified in the previous myocardial study, with peptides identified in a proteomics plasma screen. We then applied this "tier 2" (4) targeted proteomic assay on a prospective cohort of patients with LVH+ HCM and controls, to validate if any of these biomarkers circulating in the blood have potential clinical utility in terms of disease monitoring and risk stratification.
There is evidence that cardiac structural changes (crypts (5), anterior mitral valve leaflet elongation, increased apical trabecular complexity (6)), myocardial disarray (7) and aberrant mechanics (8), precede the establishment of LVH in at least some patients with HCM. We therefore additionally studied biomarker levels in the plasma of HCM sarcomere gene mutation carriers before the development of LVH (subclinical HCM). Figure 1 outlines the experimental design. This observational, prospective, single-center study recruited 110 LVH+ HCM patients and 97 healthy volunteers as controls, randomly split into training and validation cohorts, in order to identify and subsequently verify, differentially expressed proteomics biomarkers and develop them into a preliminary plasma assay. The assay was then prospectively applied to 16  ≥15 mm) in the absence of loading conditions that could produce the same magnitude of hypertrophy.

Experimental design and statistical rationale
Inclusion criteria for subclinical HCM patients were as follows: (1) confirmed pathogenic or likely pathogenic HCM sarcomere gene mutation; (2) maximal LV wall thickness <13 mm by CMR and mass within the normal range relative to body surface area, age, and gender; (3) sinus rhythm, no LVH, and no pathological Q waves/T-wave inversion on 12-lead electrocardiography (EGC); and (4) no causes of secondary LVH (valve disease, hypertension). For study inclusion, healthy controls recruited from staff at University College London and The Heart Hospital were required to have no personal or family history of cardiac disease. Exclusion criteria for all participants were needle-phobia and a recent history (<1 month) of blood transfusion or haemodialysis. Estimates for five-year risk of sudden cardiac death (SCD) were calculated using the European Society of Cardiology online clinical tool (9,10

Plasma sample preparation
Whole blood collected from individual participants was centrifuged on-site. Aliquoted plasma samples were stored in a freezer at −80°C until use. All patients with HCM donated blood samples for measurement of Nterminal prohormone of brain natriuretic peptide (NT-proBNP) levels, serum creatinine, and for genetic analysis.

Genetic analysis
Genotyping of all patients with HCM (LVH+ and subclinical groups) was approved by the University College London/University College London Hospital Trust Joint Research Ethics Committee. Blood samples were collected at initial evaluation, and genomic DNA was isolated from peripheral blood lymphocytes using standard methodology. Patients with LVH+ and subclinical HCM were screened using a targeted highthroughput sequencing methodology, and sequencing data were subjected to bioinformatics analysis as previously described (11). Briefly, 2.1 Mb of genomic DNA sequence was screened per patient, covering coding, intronic, and selected regulatory regions of 41 cardiovascular genes. Solution-based sequence capture was used followed by massive parallel resequencing on Illumina GAIIx. Average read depth in the 2.1-Mb target region was 120. For identified variants, nonsynonymous pathogenic and likely pathogenic variants were selected on frequency (12) and putative functional consequence (13): either missense variants previously published to be associated with the disease or splicing, nonsense, and frameshift variants (11).
Variants affecting the sarcomere genes were classified as either thick-filament (myosin-binding protein C  (14). Eighty micrograms (80 µg) were loaded onto a BioRad Any kD TM 7 gradient gel and coomassie® stained. Nine gel bands from each lane were excised and in-gel digested with trypsin (Promega, UK). Fractions were then analyzed using MS E label-free quantitation. Analyses were performed as previously described (15,16).

MS E label-free quantitation.
All analyses were performed using a nanoAcquity high-performance liquid chromatography (LC) and quadrupole time of flight (QToF) Premier mass spectrometer (Waters Corporation, Manchester, U.K.) as previously described (3). Briefly, peptides were trapped and desalted prior to reverse phase separation using a Symmetry C18 5 µm, 5 mm × 300 µm precolumn. Peptides were then separated prior to mass spectral analysis using a 15 cm × 75 µm C18 reverse phase analytical column.
Peptides were loaded onto the precolumn at a flow rate of 4 µL/min in 0.1% formic acid for a total time of 4 min. Peptides were eluted off the precolumn and separated on the analytical column using a gradient of 3−40% acetonitrile (ACN) [0.1% formic acid] over a period of 90 min and at a flow rate of 300 nL/min. The column was washed and regenerated at 300 nL/min for 10 min using a 99% ACN [0.1% formic acid] rinse.
After all nonpolar and non-peptide material was removed the column was re-equilibrated at the initial starting conditions for 20 min. All column temperatures were maintained at 35 °C. Mass accuracy was  Trypsin was set as the protease, and two missed cleavages were allowed. Protein identification from the low/high collision spectra for each sample was processed using a hierarchical approach where more than three fragment ions per peptide, five fragment ions per protein and more than two peptides per protein had to be matched. Protein identification parameters used in the database search included a <10 ppm mass accuracy tolerance, fixed modification of carboamidomethylation of cysteines and dynamic modifications of deamidation of asparagine/glutamine and oxidation of methionine. Relative protein abundance was calculated using the Hi-3 method (17).
Proteins with >95% confidence identification were exported for comparative analysis using Non Linear Dynamics Progenesis software. Proteins of interest were identified from this experiment on the basis of confidence score, fold change and clinical relevance.

Development of a tandem mass spectrometry UPLC−MS/MS assay for validation. Potential biomarkers
identified through plasma profiling were combined with those identified through a separate HCM myocardial tissue proteomic profiling experiment (3). In total, 26 proteotypic peptides (Table S2) demonstrated potential as HCM biomarkers and were developed into a multiplexed and targeted proteomic plasma test (Table S3). On the basis of these afore-mentioned label free proteomics analyses, the proteotypic peptides specific to proteins of interest were determined from label free proteomics data using either one of the top three most abundant peptides and selecting the optimum daughter spectra for quantitation, or the open source online global proteome machine MRM database at www.thegpm.org (18). Custom synthesized peptides (Genscript, UK) were used to optimize the detection of the peptides in plasma digest matrix. Only peptides that gave good quantitative data by assessment using a standard curve spiked in plasma, and good signal to noise ratio of the endogenous peptide, were used in the assay. Two transitions per peptide were selected for the assay: one for quantitation and one for confirmation. The most abundant clean transitions without interfering non-specific peaks were selected using the synthetic peptides spiked into matrix.

MRM-LC-MS/MS assay sample preparation.
Ten microlitres (10 µl) of plasma was precipitated with 40 µl of ice cold 10% trichloroacetic acid (TCA) in acetone. Samples were vortexed and incubated on ice for 1-2 by guest on March 7, 2020 9 hours then centrifuged for 10 min at 500 g. Supernatant was discarded and the pellet washed in 1 ml of ice cold acetone. Pellets were centrifuged, supernatant removed and freeze-dried overnight. Freeze dried pellets were re-suspended in 20 µL of 100 mM Tris, 1% amidosulfobetaine-14 (ASB-14), pH 7.8, containing 6 M urea, 5 pmol heavy labelled peptide internal standard and agitated at room temperature for 60 min. Disulfide bridges were reduced by the addition of 3 µL of 100 mM tris-hydrochloride (HCL), pH 7.8 containing 20 mM 1,4-dithioerythritol (DTE) and incubated at room temperature for 60 min. Free thiol groups were carboamidomethylated by incubation with 6 µL of 100 mM tris-HCL, pH 7.8 containing 20 mM iodoacetamide and incubated at room temperature for 45 min. The reaction mixture was then diluted with 155 µL of water and vortexed, and 150 ng of sequence grade trypsin (Promega, UK) was added to the solution. Samples were incubated overnight (12−16 h) at 37 °C in a water bath. Digested peptides were cleaned and desalted using C18 Bond Elute (Agilent UK) as described previously (19).

MRM LC-MS/MS analysis.
Dried peptides were re-suspended in 100 µL 3% acetonitrile (ACN) | 0.1% trifluoroacetic acid (TFA). Ten microlites (10 µl) of each sample were injected into a Xevo TQ-S triple quadrupole mass spectrometer coupled to a Waters standard Acquity ultra-performance liquid chromatography (UPLC) system (Waters PLC, Manchester UK). The instrument was operated in positive ion mode. The capillary voltage was maintained at 3.7 kV with the source temperature held constant at 150 °C. A Waters Acquity UPLC Cortecs C18 + column 1.7 µm 2.1 x 100 mm attached to a C18+ VanGuard precolumn was used for separation with solution A (99.9% LC-MS grade water with 0.1% formic acid) and solution B (LC-MS grade 99.9% ACN with 0.1% formic acid). The flow rate was set to 0.8 ml/min and a linear gradient of 0 to 97% solution A over 7 min. The total run time was 10 min. Pooled plasma digest was used as a quality control (QC) which was run every 10 injections. The peptide and transitions of the final biomarker panel are given in Table S3. QCs were monitored throughout the run and a coefficient of variation of +/-15% was considered acceptable. A standard curve 0-40 pmol was run at the start and end of the run. Data was analyzed using Target Lynx software (Waters, UK). Integrated peak areas were expressed as a ratio to internal standard and pmol were extracted from the standard curve.

Protein interaction network analysis
A human protein interaction network was constructed for the six promising HCM biomarkers using interaction data gathered from the publicly available database IntAct (20)  and filtered to remove non-human nodes and interactions, interactions with chemicals, self-loops and duplicated edges. Cytoscape (21) (version 3.0.2) was used as a visualization tool. Data presented in Figure S2 is from an advanced IntAct search limited to species Homo Sapiens, with interaction confidence MiScores ranging between 0.27 -0.74.
The final interactome topology was cross-checked with that generated in Interactome3D (22). A second protein interaction network for the 6 biomarkers to include non-human nodes and edges is presented in Figure S3. The complete list of proteins and interaction metadata are provided in Data file S1. 11 acquired as previously described (26). Native T1 mapping was performed on a basal and mid LV SAX slice in diastole using the shortened modified Look-Locker inversion recovery (ShMOLLI) sequence (5b(1b)1b(1b)1b, WIP# 448) (27). Standard late gadolinium enhancement (LGE) images were acquired using a fast low angle single shot inversion recovery sequence following a contrast bolus of 0.1 mmol/kg of gadoterate meglumine (Dotarem; Guerbet, Paris, France). For septal T1 measurements a region of interest of standard size in the mid septum (segment 8) was manually drawn to avoid the blood-myocardial boundary with a 20% offset. Standard LGE images were analyzed by two observers each with 6 years of experience in CMR (GC, SR) using the same software (cvi42, Circle CVI) (28). Total LGE volume was quantified using the signal threshold versus reference mean (STRM) semi-automated technique with an STRM-based threshold of > 3 standard deviations (SD) above the mean signal intensity of reference myocardium as previously described (28).

Statistics
Statistical analysis was performed in R (29) (version 3.0.1). Distribution of data was assessed on histograms and using Shapiro-Wilk test. Continuous variables are expressed as mean ± 1 SD; categorical variables, as counts and percent. Unpaired t-test was used for the comparison of normally distributed data between HCM patients and controls and χ 2 or Fisher's exact test for noncontinuous variables. Chromatograms were analyzed using Waters Targetlynx software. Peak integration was processed by manual inspection to correct for false assignments. Data were exported to Microsoft Excel (Microsoft, Redmond, WA, USA). Levels of peptides in the form of nonparametric continuous data were compared between LVH+/subclinical HCM cases and controls using nonparametric Mann-Whitney-Wilcoxon test with P value adjustment for multiple comparisons by the Bonferroni method. Correlations were calculated using Spearman's rho or point biserial correlation as appropriate. The best biomarker panel for LVH+ HCM was built using supervised machine learning (ML) with the support vector machine (SVM) classification as previously described (30). SVMs are a set of effective, supervised non-parametric ML techniques that analyze data and recognize patterns. They are increasingly being applied to proteomic datasets for classification and regression analysis (31,32), and are especially suited to two-group separation challenges like the one presented in the current work. The goal of our SVM model was to use the proteomics biomarker panel to predict which phenotypic category a by guest on March 7, 2020 participant belonged to (LVH+ HCM or control), based on an initial training set example. Performance of the tuned SVM was then verified in the validation dataset and overfitting avoided through the implementation of a 10-fold cross validation (R package 'E1071'). We constructed the SVM with a radial kernel tuned to cost 2 and gamma 1 to derive ML prediction scores per participant. For the optimal model, area under the receiver operating characteristics curve was calculated using package 'ROCR'. P values are 2-sided and considered significant when < 0.05. Data sets used in this analysis are provided in Data file S2.

Characteristics of study participants
Clinical and demographic characteristics of the cohorts are provided in Table 1. The training and validation LVH+ HCM cohorts were matched to controls except for age in the validation cohort. In this regard, we show that individual proteomics biomarker levels as well as the ML prediction score exhibit no correlation with age ( Table 2). Genetic variants identified in LVH+ and subclinical HCM patients are summarized in Table S4.

Overall workflow and quantitative reliability of plasma MRM analysis
Targeted proteomic MRM analysis of 26 candidate peptides (Table S2) was performed blindly for 223 individual plasma samples over 3 days. Eleven of the 26 peptides were filtered out from further analysis because their levels were below the limits of reliable detection in plasma. Of the remaining peptides that could be reliably detected, six ( Fig. 2) showed significant differential expression between LVH+ HCM cases and controls in the training dataset (Table 3, Fig. 3). A mean standard curve linearity of R 2 0.95 ± 0.04 was achieved across the calibration curves for the six candidate peptides of interest (Table S5 and Data file S3).
Mean coefficients of variation for target peptides between replicate quality control samples was 9.8 ± 3.3% (Table S5 and Data file S4). The MS proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (33) partner repository with the dataset identifier PXD009859.

Biomarkers differentially expressed in the plasma of HCM patients compared to controls
In the LVH+ HCM vs. controls training dataset, the levels of six proteotypic peptides (Aldolase Fructose-

Correlation of biomarker levels with clinical, humoral, genetic and imaging variables
Correlations are reported in Table 2

Interaction network analysis
Human protein interaction network analysis of the six biomarkers (Figs. S2 and S3) highlights their predominant connections to HCM pathophysiology (38). The complementary interactome enriched with nonhuman nodes and edges (Fig. S3), implicates the enzyme glutamine gamma-glutamyltransferase 2 (TGM2, UniProtKB: P21980) as the interacting node between C3 and THBS1.

DISCUSSION
Using a pipeline that combined potential biomarkers discovered through proteomic profiling of heart tissue and plasma with others identified in the literature (39), we assembled a list of 26 possible biomarkers which were put forward for verification. From these, we were able to accurately quantify and confirm six biomarkers that were elevated in the plasma of patients with LVH+ HCM compared to controls. These six multiplexed plasma markers (1 extracellular, 2 intracellular, 2 enzymes, 1 complement component) were related to myocardial substrate changes in HCM as suggested by their correlation with LV wall thickness, by guest on March 7, 2020 15 LV mass, % myocardial scar, presence of NSVT and with the 5 year risk estimate for SCD (37). Adverse clinical outcomes in patients with HCM, such as cardiac arrhythmias, heart failure and SCD, are thought to be related to myocardial substrate changes including myoarchitectural disarray, fibrosis and small vessel disease. We found that five of the biomarkers identified in LVH+ HCM patients were also elevated in the plasma of a limited number of patients with subclinical HCM where, in spite of the absence of LVH by standard imaging methods, myocardial substrate changes are already presumed to exist (40,41). There is a growing body of evidence to suggest that such myocardial substrate changes in subclinical HCM may be prognostically relevant. For example, in a post-mortem histopathological study of subclinical HCM, the hearts of four related SCD victims with apparently normal LV mass and wall thickness, demonstrated widespread myoarchitectural disarray (42) and a pathogenic HCM-causing sarcomere gene mutation was later implicated. In another UK regional post-mortem registry, nine hearts from athletes who had succumbed to SCD, again showing normal wall thickness and mass, were discovered to have myoarchitectural disarray consistent with HCM (7). These preliminary biomarker findings described for subclinical HCM suggest that the plasma proteome of individuals with subclinical HCM merits further exploration at scale, to better understand its potential clinical utility in terms of tracking pathophysiological myocardial substrate changes ahead of manifest LVH.

Description and function of the parent proteins linked to peptides identified in the study
Interactome data indicate that five of the proteins are connected in a network related to hypertrophy and fibrosis with potential relevance to the known myocardial substrate changes driving SCD in HCM, while C3 participates more distinctly in the inflammation network with inflammation increasingly gaining traction as a key pathophysiological player in HCM (38).

Extracellular protein: Thrombospondin 1 (THBS1-peptide) is a non-structural extracellular matrix
component with anti-angiogenic activity that is able to activate transforming growth factor-β, a potent profibrotic and anti-inflammatory factor (43). THBS1 has been described as a crucial regulator of cardiac matrix integrity enabling the myocardium to adapt to increased pressure loading (44). It is minimally expressed in the normal heart but markedly upregulated following cardiac injury (45). The THBS family has five members of which THBS1 is one of the best studied. In a mouse model of pulmonary hypertension, by guest on March 7, 2020 https://www.mcponline.org Downloaded from over-expression of THBS1 was demonstrated in the hypertrophied right ventricle (46). It was also overexpressed in LV myocardium from mouse models of cardiac hypertrophy (44,47) and from patients with LVH secondary to aortic stenosis (44).

Intracellular proteins:
The ubiquitously expressed single-copy Ras Suppressor Protein 1 (RSU1-peptide) gene is expressed specifically in the human heart (48) where it encodes a leucine-rich repeat protein. RSU1 interacts with PINCH and integrin-linked kinase (ILK) serving as a molecular scaffold for cellular focal adhesion (49). A marked increase in myocardial levels of ILK proteins has been reported in patients with congenital and acquired outflow tract obstruction (50) causing ventricular hypertrophy. Our study is the first to describe RSU1-peptide elevated in HCM and the first study to describe its potential as a plasma biomarker.
Talin I (TLN1-peptide) is a large dimeric cytoskeletal protein that activates integrins and mediates their adhesion to the actin cytoskeleton. Integrins are key mechanotransducers in cardiomyocytes and are intimately involved in the process of cardiac hypertrophy (51). During embryogenesis, cardiomyocytes exhibit high TLN1 levels, but these decline in the mature heart (52). TLN1 has been found to be up-regulated in the costameres, both in a HCM mouse model and in the myocardium of adult humans with heart failure (52).

Enzymes:
We observed elevated levels of Glutathione S-Transferase Omega-1 (GSTO1-peptide) in the myocardium (3) and plasma of HCM patients but reduced levels of another glutathione S-transferase (Kappa-1, GSTK1) has been reported in murine HCM models (53). This may reflect the fact that the two anti-oxidant enzymes, GSTO1 and GSTK1, are functionally distinct with the former localising to the cytosol and the latter to the mitochondria and peroxisomes (54) thus their expression may be differentially impacted by HCM. Another study of a cor pulmonale mouse model has shown high levels of GSTO1 in the hypertrophied right ventricular myocardium (55) confirming our observation in human tissue and plasma and indicating its potential as a marker to monitor cardiac oxidative stress.
Aldolase Fructose-Bisphosphate A (ALDOA-peptide) was found altered in the previously published myocardial proteomics analysis (3). It is a glycolytic enzyme found in skeletal as well as cardiac muscle by guest on March 7, 2020 where it functions as a scaffolding protein binding to actin or actin-tropomyosin (56). Serum aldolase concentrations have been shown to be elevated in patients with Danon disease, a genetic disorder causing a weakening of the heart (excluded from this study) (57). ALDOA activity has been observed to be upregulated in response to hypoxia (58) which could indicate this protein could be an indirect marker of hypoxic stress in HCM.

Complement component:
The anaphylatoxin Complement C3 (C3-peptide) is generated by activation of the innate immune system and has been shown to be elevated in a limited number of patients with LVH (5 HCM, 3 hypertensives, 1 athlete) (59) and in a larger cohort of hypertensive patients (60). The human protein-protein interaction network for our six biomarkers (Fig. S2) illustrates how the C3 cluster representing inflammation appears distinct from the remaining five biomarkers that more generally reflect myocyte hypertrophy and fibrosis. Figure S3 implicates the glutamine gamma-glutamyltransferase 2 (TGM2) as the interacting node putatively linking mechanisms of myocyte hypertrophy and inflammation.
Indeed, a mouse model with increased cardiomyocyte TGM2 (61) has been shown to have up-regulated COX2 expression leading to LVH, fibrosis and eventual cardiomyocyte apoptosis.
It is known that myocyte hypertrophy and disarray lead to augmented protein synthesis in the myocardium, partly from the reactivation of fetal transcription (62). This may partly explain the high circulating levels of some of our biomarkers, particularly for those involved in intracellular hypertrophic signal transduction. The presence in the plasma of HCM patients of some of these proteotypic peptides may reflect hypertrophyrelated microparticle secretion by cardiomyocytes or the liberation of intracellular components following cell death.

Clinical Correlations
We applied the assay to a randomly selected cohort of patients with LVH+ HCM and controls and replicated our HCM patients could be related to mechanisms of myocyte hypertrophy and disarray, but they may also reflect myocardial fibrosis, or other downstream effects observed in heart failure.
Our data suggest that LVH+ HCM patients with high ML proteomics prediction scores (upper quartile) are more likely to be at high risk of SCD, with projected SCD rates that are 1.4 fold higher than in patients with ML prediction scores in the lower quartile. Apart from an association with NSVT (9, 10), the ML prediction score also related to CMR markers of adverse disease progression in HCM including extensive LGE, LV mass and diffuse myocardial fibrosis by native T1 mapping. If our findings are confirmed in larger cohorts, the proteotypic peptide biomarkers we describe in this study may be a valuable additional factor for future risk stratification schemes.
We developed the entire proteomics method on a triple quadrupole-MS based platform to make the validation of multiple biomarkers high throughput for large numbers of samples. This approach and development of this assay makes potential further validation and signature refinement possible on other center cohorts in the future, as well as its application to disease models for novel therapy studies. Our assay uses plasma-a patient biofluid-which compared to myocardial tissue, is less limited in sample access, obtained less invasively, easier to sample repeatedly and to process for storage and analysis in a more standardized and less complex manner.
Elevated levels of NT-proBNP (a biomarker of ventricular wall stress (63)) associate with a higher risk of heart failure, death or transplantation in LVH+ HCM (64). In this study, there was no correlation between NT-proBNP and our proteomics assay (with the exception of RSU1, r s = 0.21; P = 0.028) suggesting that the majority of our candidate peptides may be tracking myocyte hypertrophy (LV mass and MWT) and fibrosis (native T1 and LGE) rather than myocardial stress. Though results from enzyme-linked immunosorbent assays appraising collagen synthesis biomarkers in venous blood samples of HCM patients have previously been conflicting (40,64), recent data in Fabry disease (a genetic lysosomal storage disorder resulting in pathological cardiac hypertrophy) demonstrated a correlation between levels of type I collagen synthesis and degradation biomarkers in blood, and LV mass and scar by CMR (65). This would fit with our findings where a different set of proteomics plasma markers of fibrosis, similarly correlate with LV wall thickness, mass and scar by CMR in LVH+ HCM. We recently reported exploratory myocardial proteomics profiling experiments in which lumican was upregulated in LVH+ HCM compared to controls (3). A similar trend was by guest on March 7, 2020 19 observed in the current plasma experiments, but the differences between LVH+ HCM and controls were not significant to justify inclusion of lumican in the final multiplex panel. Factors potentially explaining the mitigated lumican trends in HCM plasma compared to myocardium, include the relatively small number of control samples used in the myocardial experiments (n = 7) when compared to the current work (n = 97), that patients in the myocardial study exhibited more advanced drug refractory disease (given they all underwent myectomy to treat LV outflow tract obstruction), and that peptide biomarkers discovered in whole tissue myocardial homogenates are not necessarily liberated or secreted into the circulating plasma at sufficiently high concentrations to impact the HCM plasma proteome.

Study limitations
Though assay results were promising, this was a single center study. Results merit validation in a larger, multicenter study of both LVH+ and subclinical HCM. Other diseases with hypertrophy and other cardiomyopathies were not explored in the current work, thus the detected biomarkers may not be unique to HCM. The utility of candidate biomarkers was assessed at a single time-point in subclinical and established HCM vs. healthy volunteers and against a limited panel of clinically relevant parameters thought to be significant (e.g. BNP, LGE, MWT). Due to the high prevalence of CIEDs at the time of this study, not all participants underwent CMR. Targeted validation of the identified peptides by enzyme-linked immunosorbent assay approaches was not undertaken in this study.

CONCLUSIONS
Using a combination of targeted and non-targeted proteomic approaches, we have discovered six potentially useful circulating plasma biomarkers related to myocardial substrate changes in HCM, which correlate with the estimated sudden cardiac death risk.     *Machine learning prediction scores of the combined six marker assay calculated by a support vector machine supervised machine learning method in the study population.
Reporting Spearman's correlations (r s ) for continuous variables (proteomic analyte distributions are non-parametric) or point biserial correlations (r pb ) for binary variables.
Significant correlations are highlighted in bold.
Other abbreviations as in Table 1.
by guest on March 7, 2020 Plasma protein analytes selected for the biomarker assay were quantified by label-free mass spectrometry in HCM patients and controls. The six proteins showing differential expression in LVH+ HCM and controls were combined into a multimarker assay using machine learning. ‡Direction of the fold change comparing LVH+ HCM vs. controls and subclinical HCM vs. controls. *Significant P values are highlighted in bold. Significance levels were calculated using the non-parametric Mann-Whitney-Wilcoxon test with P value adjustment for multiple comparisons by the Bonferroni method. CoV, coefficient of variation; QC, quality control (see Supplementary Table S5). Other abbreviations as in Table 1.
by guest on March 7, 2020