Dissecting the polygenic basis of atherosclerosis via disease-associated cell state signatures

Summary Coronary artery disease (CAD) is a pandemic disease where up to half of the risk is explained by genetic factors. Advanced insights into the genetic basis of CAD require deeper understanding of the contributions of different cell types, molecular pathways, and genes to disease heritability. Here, we investigate the biological diversity of atherosclerosis-associated cell states and interrogate their contribution to the genetic risk of CAD by using single-cell and bulk RNA sequencing (RNA-seq) of mouse and human lesions. We identified 12 disease-associated cell states that we characterized further by gene set functional profiling, ligand-receptor prediction, and transcription factor inference. Importantly, Vcam1+ smooth muscle cell state genes contributed most to SNP-based heritability of CAD. In line with this, genetic variants near smooth muscle cell state genes and regulatory elements explained the largest fraction of CAD-risk variance between individuals. Using this information for variant prioritization, we derived a hybrid polygenic risk score (PRS) that demonstrated improved performance over a classical PRS. Our results provide insights into the biological mechanisms associated with CAD risk, which could make a promising contribution to precision medicine and tailored therapeutic interventions in the future.

Using single-cell RNA sequencing, researchers identified 12 atherosclerosis-associated cell states and found that Vcam1þ smooth muscle cell state contributed the most to the genetic heritability of coronary artery disease. A hybrid polygenic risk score (PRS) that considers cell-typespecific regulatory elements demonstrated improved performance over a classical PRS.

Introduction
Cell cycle stages and cell types are well understood dimensions of cellular heterogeneity that are routinely applied to distinguish cell populations. Still, within a population of relatively homogeneous cells, significant variability in specific responses to an identical stimulus can be detected, also known as cell states. Efforts to determine the extent of such cell-to-cell heterogeneity in pathological conditions have been revolutionized by the emergence of single-cell RNA sequencing (scRNA-seq)-based studies. To this end, cardiovascular research has witnessed an explosion of studies describing an unprecedented degree of cell diversity in atherosclerotic lesions. scRNA-seq and cytometry by time of flight (CyTOF) methods have identified at least 19 types of leukocytes, consisting of 3-5 different macrophage subsets, 3-5 T cell subsets, two B cell subsets, two types of monocytes, two types of dendritic cells, one NK cell subset, neutrophils, and eosinophils in atherosclerotic mouse aortas. 1 In addition, seven clusters of smooth muscle cells (SMCs) have been identified, demonstrating dedifferentiation of contractile SMCs into intermediate cells, termed ''SEM'' cells (stem cell, endothelial cell, monocyte) and further into fibromyocytes/fibrochondrocyte-like cells and to an osteogenic phenotype. [2][3][4][5][6] Additionally, eight endothelial cell (EC) subpopulations have been identified as a result of diabeto-genic high-fat diet or disturbed flow where proatherogenic conditions induced a dramatic transition of ECs into progenitor, proinflammatory, mesenchymal (EndMT), and immune cell-like phenotypes. 7,8 Still, consistent definition and comparison of altered cell states in atherosclerosis is missing, complicating the understanding of mechanisms and pathways that contribute to pathological cell states.
An individual's risk of atherosclerotic coronary artery disease (CAD) is determined by an interplay of environmental and genetic factors. The recent genome-wide association studies (GWASs) for CAD with over 1 million participants have identified over 300 risk loci. 9,10 Despite extensive progress in statistical and experimental tools used to link regions of the genome to disease risk, it remains challenging to identify the causal genes underlying genetic associations and the cell types through which the effect is mediated. We have recently demonstrated that cell-type-specific chromatin accessibility and gene expression provide a means for predicting the cell type of action for CAD loci. 11 To this end, we demonstrated that the cisregulatory elements active in SMCs and ECs show the highest enrichment of GWAS SNPs for CAD and blood pressure among the lesional cell types. However, the contributions of specific disease-associated cell states or gene signatures to the disease risk and to CAD heritability remain unknown.
To shed light into these questions, we performed scRNAseq of thoracic aortas of healthy and atherosclerotic Ldlr À/À /Apob 100/100 mice during disease progression. Our results identify 12 cell states that are increased in response to atherogenic changes. We validate our findings in bulk RNA-sequencing (RNA-seq) studies in a separate mouse experiment and in human samples from affected individuals. We identify common pathways, transcription factors, and ligands that define the cell state changes. Finally, we link our findings to GWASs to prioritize the relevance of atherosclerosis-associated cell states in the biology and heritability of CAD and apply this knowledge in polygenic risk prediction.

Mouse model
To model a disease stage course of atherosclerosis in mouse, we performed four combinations of genotype and dietary protocol. We fed the atherosclerosis-prone low-density lipoprotein receptor-deficient mice expressing only apolipoprotein B100 (Ldlr À/À / Apob 100/100 ) 12,13 (The Jackson Laboratory strain #003000) a highfat diet (HFD; Teklad TD.88137) for 0, 1, or 3 months to model prelesion, early, and late disease stages, respectively. We timed diet-starting age to equalize the age at sample collection between all groups (8 months old). Age-matched wild-type C57BL/6J mice fed chow diet were used as a healthy control. Male and female mice were used (mouse counts for each experiment are stated below). Throughout the study, mice were maintained on a 12-h light-dark cycle and had access to food and water ad libitum. All animal experiments were approved by the local ethics committee and carried out in compliance with European Union Directive 2010/63/EU.

Aorta single-cell RNA-seq library preparation
Mice were anesthetized with isoflurane and euthanized by cervical dislocation. The mice were perfused by cardiac puncture with 10 mL of ice-cold PBS supplemented with 20 U/mL heparin and placed on ice for dissection. The entire thoracic aorta (i.e., the ascending aorta, the aortic arch, and the descending aorta up to the diaphragm) was extracted and used in scRNA-seq sample preparation. The adventitia was mechanically removed under a dissection microscope and discarded. Each thoracic aorta minced with a scalpel and enzymatically dissociated into single-cell suspension in 0.8 mL of Multi Tissue Dissociation Kit 1 (Miltenyi Biotec #130-110-201) enzymatic mixture reconstituted in RPMI 1640 medium supplemented with 0.5% bovine serum albumin (BSA) and 20 mM HEPES buffer (pH ¼ 7.2). The mixture was incubated at 37 C with end-over-end rotation. After 20, 40, and 60 min of incubation time, tissue pieces were left to settle for 10 s, and 0.8 mL of cell suspension supernatant was collected from the tube and placed on ice. To continue the digestion, we added 0.8 mL of fresh enzymatic digestion mixture. The collected cell suspension was filtered through a 30 mm cell strainer, centrifuged at 400 g for 8 min at 4 C, resuspended in PBS supplemented with 1% BSA, and placed on ice. The three cell suspension fractions for each aorta were subsequently pooled together and red blood cell lysis was carried out by adding 9 volumes of ice-cold 1X RBC Lysis Buffer, Multi-species (eBioscience #00-4300-54) and incubating on ice for 3 min. Subsequently, we added PBS to normalize the buffer and collected the cells by centrifugation (400 g for 8 min at 4 C). Magnetic removal of dead cells was carried out with the Dead Cell Removal Kit (Miltenyi Biotec #130-090-101) with Miltenyi MS magnetic columns following the manufacturer's instructions. Cells were resuspended in PBS containing 0.04% BSA and counted by hemocytometry with trypan blue staining. Cell viability was between 74% and 85%. In each experiment group, aortic cell isolation was carried out separately from three male mice, the cells were stained with TotalSeq cell hashing antibodies (BioLegend) according to the manufacturers' recommendations, and subsequently the cells from individual mice were pooled in equal proportions into one lane of the Chromium Controller microfluidics chip (10x Genomics). We only used male mice to minimize experimental variation and increase sensitivity of the experiment to detect disease-associated changes. This decision was justified on the basis of bulk RNA-seq results (below), which indicated that the effect of treatment on the major cell type proportions was stronger than the effect of individual factors such as sex ( Figure S8C). We used the Chromium Single Cell 3 0 Kit (v2 Chemistry; 10x Genomics) to prepare scRNA-seq libraries. Paired-end high-throughput sequencing was carried out on an Illumina NovaSeq instrument (read 1: 26 bp, read 2: 91 bp).

Aorta bulk RNA-seq library preparation
Mouse aortas were dissected as for scRNA-seq (described above), flash-frozen, and cryogenically pulverized with the Cellcrusher cryo-press cooled with liquid nitrogen. Total RNA was isolated and treated with DNaseI with the Absolutely RNA Nanoprep Kit (Agilent). RNA was quantified with the Bioanalyzer RNA 6000 pico assay (Agilent) and RNA-seq libraries were prepared with the SMARTer v2 Stranded Total RNA-Seq Pico Kit (Takara Bio). Sample size for bulk RNA-seq was six per group, consisting of four males and two females (except the 3-month HFD group, which had four males and one female). High-throughput Illumina sequencing (read length 75 bp, single-end) was carried out at EMBL GeneCore.

Single-cell RNA-seq data analysis
Sequencing reads were processed with the Cell Ranger pipeline (version 3.0.2; 10x Genomics) and the 10x Genomics mm10 reference transcriptome package (version 3.0.0). The Cell Rangerfiltered cell barcode count matrices were subsequently processed with Seurat (version 3.1.0) 14 running in R version 3.5.3. The standard (log normalization-based) workflow recommended by Seurat authors was used. For each library, to remove low-quality barcodes, we visually evaluated cell quality metrics (genes per barcode, UMIs per barcode, and mitochondrial read fraction) by using violin plots and selected cutoffs, which resulted in retaining 87%-93% of barcodes. In total, 36,157 cells were retained with median 1,629 (mean 2,043) UMI counts per cell, median 875 (mean 965.4) genes per cell, and median 5.8% (mean 5.8%) mitochondrial counts. Cells from all libraries were integrated with the canonical correlation analysis (CCA) method implemented in Seurat version 3 14 with default parameters. After CCA, the standard count processing was rerun on the integrated assay with default parameters, except for the number of principle components used (set to 35) and the clustering resolution (set to 1.1). The resulting 23 clusters were manually annotated to a general cell type level with the following marker genes: macrophage (Csf1r, Cd68, Adgre1), smooth muscle (Myh11, Tagln, Cnn1), endothelial (Pecam1, Cdh5, Cldn5), pericyte (Rgs5), dedifferentiated SMC (Vcam1, Lgals3, Dcn), chondrocyte-like (Comp, Fmod), fibroblast (Fbln1, Pdgfra, Serpinf1), mesenchymal stromal (Ly6a, Pi16), NK/T (Cd3d, Cd8b1, Nkg7), and epithelial-like (Upk3b). Hashtag assignment was carried out with the HTODemux function of Seurat. As the Biolegend TotalSeq cell hashing staining reagent is a mixture of anti-CD45 and anti-MHC I antibodies, HTODemux was run separately for the immune and non-immune cells of each library. To remove ambient RNA contamination, we ran the DecontX function (celda R package version 1.1.6) 15 by using the automated clusters (resolution 1.1) as the cell population labels and we used the decontaminated counts for subsequent analysis. Cells with an estimated ambient RNA contamination fraction > 0.3 were excluded from downstream analyses as possible doublets or otherwise lowquality cells.
To identify disease-associated cell states, we divided the Uniform Manifold Approximation and Projection (UMAP) plot from the integration of all libraries into a grid of 200 by 200 (for visualization) or 50 by 50 (for cell selection) subdivisions and counted the number of cells from each library falling into each sector. The cell occupancy was normalized to the total number of cells in each library, scaled to a library size of 10,000 cells, and log 2 -transformed. Subsequently, for each sector, the fold change in cell abundance was calculated between the late disease sample and the healthy control sample. UMAP regions displaying a more than 2-fold (>1 log 2 fold change) increase in cell occupancy in response to disease were selected for further investigation.
Marker genes of disease-increased UMAP regions (cell states) were defined with the Wilcoxon rank-sum test (implemented in Seurat) comparing the cells located within the area of increased UMAP occupancy to the other cells of the same general cell type (i.e., cells not changed in abundance upon disease). Genes expressed in at least 10% of cells (in at least one of the cell populations of the comparison) were considered and the minimum required log fold change was 0.25. p values were corrected for multiple testing with false discovery rate (FDR) and p adj < 0.05 was considered significant. One highly reliable marker gene was selected to name the cell state whenever possible. The full marker lists are available in Table S2.
Gene Ontology enrichment analysis for cell state marker gene lists was performed with the g:Profiler web tool (access date 2020-03-26). 16 All cell state marker gene lists were tested for enrichment against all Gene Ontology Biological Process (GO:BP) categories. The cell state SMC-4 was not included in this analysis, as it had only three marker genes. To reduce redundant and overly general hits, we filtered the GO:BP enrichment results for each cell state marker gene list to remove categories with >1,000 genes in mouse and we further filtered for GO term semantic similarity by using the GOSemSim package (version 2.8.0) 17 by ranking the GO terms on the basis enrichment p value and, starting from the top, removing any less significant terms with Rel similarity measure > 0.9. After the GO term filtering, the top four most significantly enriched GO:BP terms were selected from each cell state (provided the terms satisfy p adj < 0.05) for a comparative enrichment heatmap.
To plot gene set enrichment at the single-cell level, we used the VISION package (version 2.0.0) 18 with the GO:BP and MSigDB Hallmarks 19 gene sets.
To predict which extracellular ligands may serve as upstream inducers of a cell state, we used the NicheNet package (version 0.1.0). 20 For the query gene signature (i.e., the gene set to be explained by ligand-receptor interactions), an entire list of marker genes for a disease-associated cell state was used (described above). The cells of the disease-increased cell state were considered the ''receiver'' cell population (expressing receptor) and all other cell types in the dataset were considered potential ''sender'' cells (expressing ligands). A ligand or receptor was considered expressed in a cell population if detected in at least 10% of the cells. Predicted ligand activities were ranked with the Pearson correlation coefficient (default).
We used SCENIC (version 1.1.1) 21 to predict transcriptional regulators, motifs, and regulons that are preferentially active in disease-associated cell states. SCENIC single-cell regulatory network inference was run following the published vignette for one cell type at a time. Cells from the maximally different disease stages (control and late disease) were used. For SMC-related cells, cells were further randomly subsampled with the automated clusters (resolution 1.1; described above), keeping a maximum of 200 cells per cluster. The cisTarget database was mouse mm9 transcriptional start site (TSS)-centered (510 kb), v9 motifs, seven species (download date 2020-01-24). The activity of each predicted gene network was scored in each cell and binarized, as described by the SCENIC authors. We used the Wilcoxon rank-sum test to find regulons that are differentially active in one cell state compared to the other cells of the same cell type. For differential activity, the regulon was required to be active in at least 20% of the cells of one population and satisfy log fold change > 0.15 and FDR < 0.05.

Decomposition of cell states in mouse bulk RNA-seq
We trimmed mouse aorta RNA-seq reads by using Trim Galore (version 0.4.4; GitHub: https://github.com/FelixKrueger/ TrimGalore) to remove adapters and bases with quality score below 20. We aligned reads to the mm9 mouse genome with STAR (version 2.5.4) 22 and used HOMER (version 4.9) 23 to generate a gene count matrix from uniquely mapped reads.
To evaluate relative changes in cell state abundance between atherosclerosis disease stages based on mouse aorta bulk RNAseq profiles, we applied the Cell Population Mapping method (scBio R package; version 0.1.5). 24 Aorta scRNA-seq cells and the integrated UMAP were used as the cell state space. The bulk RNA-seq count matrix was normalized for transcript length with reads per kilobase per million reads mapped (RPKM) and log-transformed. Cell Population Mapping was run in relative abundance mode (subtracting the mean of the reference group from all samples in the test group) with default parameters.
Cell state gene signature activity in human atherosclerosis study cohorts Gene expression microarray data was obtained for human aortic plaque (n ¼ 15; abdominal aorta), carotid plaque (n ¼ 29), femoral plaque (n ¼ 24), and non-atherosclerotic control artery (n ¼ 24; left internal thoracic artery) from the Tampere Vascular Study 25 and from human carotid artery segments classified as either advanced (n ¼ 16) or early (n ¼ 13) atherosclerotic plaque (GEO: GSE28829). 26 Gene expression levels were quantile-normalized with the preprocessCore R package (version 1.52.1; GitHub: https://github.com/bmbolstad/preprocessCore). To calculate gene signature activity scores in these bulk RNA profiles, we used the approach of Tirosh et al. 27 (implemented as the Seurat function AddModuleScore), wherein genes are binned on the basis of average log expression level across samples, and in each sample, a bin background level (calculated from random control genes in the same bin) is subtracted from the levels of the test genes. The full lists of marker genes of atherosclerosis-increased cell states were used as the tested gene sets (gene programs).

Prioritized genes at human CAD GWAS loci
To associate human CAD GWAS loci to their potential causal genes, we collected published gene prioritization results across multiple different prioritization approaches and GWASs.
Partitioned heritability with linkage disequilibrium score regression (LDSC) LDSC (version 1.0.1) 40 and the van der Harst et al. CAD GWAS 29 full summary statistics (downloaded from Mendeley Data: https://doi.org/10.17632/gbbsrpx6bs.1) were used for partitioned heritability analysis of CAD following the recommendations published by LDSC authors. The marker genes of atherosclerosis-associated cell states (described above) were used as the gene sets. The gene-transcribed regions in the hg19 human genome were padded with 100 kb upstream and downstream. The LD score window was 1 cM (default), the provided 1000 Genomes (1000G) Phase 3 EUR data files were used, and the 1000G Phase 3 EUR baseline models were included in the calculation.

Accessible chromatin by cell type in human atherosclerotic plaques
We used human endarterectomy single-cell assay for transposaseaccessible chromatin with sequencing (scATAC-seq) 11 transposase cut site coordinates and cell annotations (available at FigShare: https://doi.org/10.6084/m9.figshare.14501985.v2) to aggregate cut sites to pseudobulk at the level of cell lineages (SMC, EC, macrophage, T/NK, and B/plasma) and we called peaks in each cell lineage separately by using MACS2 41 with fragment extension 200 bp and shift À100 bp. Where equal-width peaks are stated, we resized peaks centering on the MACS2-called peak summit coordinate (cell type specific).

Gene set-based PRSs
We used the PRSice-2 (version 2.3.3) 42 feature PRSet, 43 which performs region-aware (coordinate set-based) SNP clumping, to generate clumping and thresholding PRSs based on sets of cell state marker genes, scATAC-seq peaks, or genes of a biological pathway. In PRSet, SNPs falling within regions of interest are preferentially retained for each LD clump of SNPs. The clumping dis-tance was 500 kb to either side of the index SNP and the LD r 2 threshold was 0.2.
The base data for all PRSs were the additive model summary statistics of the CARDIoGRAMplusC4D 1000 Genomes-based CAD GWAS, 44 a study that does not have significant sample overlap with the UK Biobank. 29 We filtered base data genetic variants to keep variants with INFO score R 0.8 and to exclude variants with strand-ambiguous alleles.
The UK Biobank 45 was used as the target cohort for all PRS scoring. We filtered the imputed genotype data to keep autosomal variants with MAF R 0.01 and INFO score R 0.8 and exclude variants with Hardy-Weinberg equilibrium test p < 1eÀ25 or genotype missingness rate > 0.1 by using PLINK (version 2.00). 46 The final variant set across base and target data consisted of 4,958,173 variants. We included participants of self-reported White ancestry in the precomputed UK Biobank PCA calculation (unrelated samples) and selected available imputed genotype data and filtered to exclude participants with sex chromosome aneuploidy, heterozygosity or genotype missingness outliers, participants with excess relatives, kinship inference analysis exclusions, those more than seven standard deviations away from mean in the first six principal components (PCs) (GitHub: https://github.com/ Nealelab/UK_Biobank_GWAS), and participants who had withdrawn consent. CAD phenotype status in the UK Biobank was defined on the basis of the example of Choi et al. 43 (specifically, GitLab: https://gitlab.com/choishingwan/prset_analyses/-/blob/ master/script/sql/generate_cad.sql), which defined CAD cases by using primary or secondary ICD10 or ICD9 codes for hospital inpatient records or cause of death and additionally OPSC4 operation codes and self-reporting by the participant. In total, there were 21,600 cases and 359,254 controls.
For gene set-based PRSs generated from gene coordinates alone (i.e., not using scATAC-seq regulatory region information), we extended the transcript regions to include 35 kb upstream and 10 kb downstream. To utilize scATAC-seq peaks to inform cell state marker gene PRSs, we only considered the scATAC-seq peaks called in the corresponding cell type and peaks within 5500 kb of the gene TSS were retained for PRS generation.
PRS performance was evaluated as set out in the PRSet method. 43 We evaluated variance explained by the PRS (PRS.R 2 ) by subtracting the pseudo R 2 of the full model (CAD $ PRS þ covariates) from the pseudo R 2 of the null model (CAD $ covariates). The covariates used in PRS modeling were sex, age (earliest age of CAD for cases, oldest age at attending UK Biobank center for controls), UK Biobank assessment center, genotype batch, and the first ten genetic PCs. We used the PRSet competitive p value calculation, 43 based on permutation testing (10,000 permutations), to test for signal enrichment compared to identically clumped SNPs in regions of the genome considered background (defined as either all genes or all scATAC peaks).

Identification of atherosclerosis-associated cell states
To provide an unbiased, enrichment free, analysis of all the cell types of the healthy and atherosclerotic vascular wall, we performed scRNA-seq from single cell suspensions extracted from thoracic aorta. Cells from three mice were labeled with hashtag antibodies and pooled from each of four conditions, including healthy control mice and Ldlr À/À /Apob 100/100 mice with pre-lesioned, early, and late atherosclerotic lesions, and prepared into libraries with the 103 Chromium system ( Figure 1A). The scRNA-seq profiles of 36,157 cells passed quality control and were selected for downstream analysis ( Figure S1). Joint analysis of atherosclerotic and control samples with automated clustering identified 23 cell clusters that corresponded to nine major cell types, including endothelial cells (ECs), NK/T cells, monocyte/macrophages (MPs), smooth muscle cells (SMCs), pericytes, dedifferentiated SMCs, chondrocyte-like cells, mesenchymal stromal cells (MSCs), and fibroblasts (FBs) according to marker gene-based curation (Figures 1B and S2 and  Table S1).
Extensive variability in the regional occupancy of the Uniform Manifold Approximation and Projection (UMAP) was observed across different disease stages after data integration, indicating a spectrum of cellular states associated with atherosclerosis. To define discrete cell states (within broader cell types) that are atherosclerosis associated, we selected regions of the UMAP plot where the local cell density was increased >2-fold in late disease compared to control. (Figures 1C and S3). Differential neighborhood abundance analysis based on the k-nearest neighbor graph (Milo method 47 ) largely confirmed the regions of altered abundance ( Figure S4). The marker genes of disease-increased cell states were defined by comparing the cells located within the area of increased UMAP occupancy to the other cells of the same cell type that did not change in abundance (full marker lists in Table S2). Therefore, it should be noted that the cell state markers represent a distinct signature from the general cell type markers that are obtained by comparison between cell types (Tables S1 versus S2). Using the marker gene sets to score gene program activity revealed that the cell state signatures tend to be upregulated gradually, and the cells within the defined cell state showed >1 SD unit of elevated expression relative to the expression variation in the cell type as a whole ( Figure S5).
Our analysis identified 12 cell states that were increased in at least one of the diseased conditions ( Figure 1D), which were named on the basis of one of the top marker genes. These included three SMC cell states, including Vcam1þ SMC, Col2a1þ SMCs, and Palldþ SMCs, three macrophage-derived cell states called Spp1þ MPs, Ccl4þ MPs, and Stmn1þ MPs, two EC cell states corresponding to Lrg1þ ECs and Calcrlþ ECs, and two NK/T cell states, Nkg7þ NK/Ts and Ramp3þ NK/Ts, as well as Smoc2þ FB state and Apoeþ MSC state. We noted a further candidate population of disease-increased cells among the contractile (Myh11þ) SMCs ( Figures S2B, S4, and 1D). However, it appeared to be a very mild perturbation, showing only three marker genes (Slc22a1, Rbp4, Ndufa3), and therefore was not included in subsequent analyses. Importantly, the cell states demonstrated temporal patterns corresponding to the progression of the disease. For example, the proportion of cells associated with EC and MP states as well as Vcam1þ SMC and Palldþ SMC state cells exhibited a gradual increase toward late disease stage, whereas NK/T states, Col2a1þ SMCs, and Smoc2þ FBs were mostly detected at the late disease stage ( Figure 1E). This highlights the importance of temporal resolution in scRNA-seq-based analysis of disease progression. Among the cell state markers, we selected LRG1, VCAM1, and PALLD for validation with immunohistochemistry ( Figure S6) and Lrg1 and Palld for confirmation by using spatial transcriptomics with the Molecular Cartography platform by Resolve Biosciences ( Figure S7). In addition, we used alternative marker genes to visualize the Vcam1þ SMCs (Col6a3), Col2a1þ SMCs (Col6a3 and Sox9), Ccl4þ MPs (Il1b), Spp1þ MPs (Abca1), and Stmn1þ MPs (Top2a) because the primary marker genes of these states exceeded the maximum expression level limit and had to be excluded at the panel design stage ( Figure S7). Altogether, this analysis confirmed the distinct location and identities of several of the cell states.

Validation of the atherosclerosis-associated cell states in mouse and human lesions
To confirm that the identification of atherosclerosis-associated cell states is not confounded by technical issues related to scRNA-seq, we further sough to analyze the presence of the gene signatures in bulk RNA-seq. In particular, cell dissociation could alter the relative proportions of cell populations in scRNA-seq experiments, an effect that would not occur with bulk RNA-seq. 48 We therefore performed a parallel bulk RNA-seq experiment with 5-6 replicates and estimated the performance of these two methods in capturing cell type proportions during atherosclerosis progression ( Figure 1A). Altogether 1,049 unique genes were upregulated and 851 downregulated during the course of disease development (Table S3). Notably, a large majority of the bulk RNA disease-induced genes were expressed in immune cells, in line with expected infiltration of these cells to the vascular wall during disease progression and indicating that changes in cell type proportion are a major driver of bulk RNA-seq differential expression ( Figures S8A and S8B). Deconvolution of bulk RNA-seq data with the scRNA-seq data as in Newman et al. 49 confirmed the gradual increase of macrophage and NK/T cells relative to other cell types ( Figure S8C). This was also evident from the scRNA-seq data itself, although the relative proportion of leukocytes was smaller compared to bulk RNA deconvolution results, possibly reflecting selective cell loss due to tissue dissociation ( Figure S8D). To further leverage the bulk RNA-seq data to validate the atherosclerosis-associated cell states, we estimated the relative cell state abundance changes by using cell population mapping, 24 where scRNA-seq profiles are used to infer the composition of cell states from bulk transcriptome data. This analysis supported the increase in cell state abundance for majority (9/12) of the cell states but failed to recapitulate Palldþ SMCs, Smoc2þ FBs, and Apoeþ MSC state changes, possibly because of the low amount of cells in these populations (Figure 2A). Altogether, this analysis supports the reliability of our workflow for cell state identification from single cell expression data.
We next sought to investigate the reproducibility of the atherosclerosis-associated cell states in public datasets of mouse and human scRNA-seq. To achieve this, we compared the enrichment of gene set modules in the cell type of interest in our data, in different mouse models of atherosclerosis 4 and in human coronary lesions 3 ( Figures S9-S14). In the Pan et al. 4

dataset including
Ldlr À/À (0, 8, 16, and 26 weeks of HFD) and Apoe À/À (8, 16, and 22 weeks of HFD) mouse models, the cell state markers were enriched in a specific subpopulation of cells, suggesting that they represent distinct phenotypes ( Figure S11). The fraction of gene set positive cells were increased upon disease progression for all other cell types except the Palldþ SMC and Calcrlþ EC states, which demonstrated a relative decrease from the earliest timepoints in both mouse models (Figures S12 and S13). In addition, FB and MSC states were only increased between 8 and 16 weeks of HFD in the Ldlr À/À mouse model, whereas NK/T states exhibited a decrease in cell fraction between 16 and 26 weeks of HFD, suggesting mouse-modeland time-point-specific differences.
In human lesions, 3 all the cell state markers with the exception of Calcrlþ ECs had a distinct localization within the cell type cluster ( Figure S14), supporting that the same cell states are present. However, the available human scRNA-seq datasets do not allow comparison between healthy and diseased vasculature, which is why we further extended our analysis to bulk RNA-seq datasets to investigate whether the cell state signatures were associated with disease progression. To achieve this, we studied the expression of the cell state markers in two human clinical cohorts, the Tampere Vascular Study, 25 representing 68 advanced atherosclerotic plaques (15 aortic, 29 carotid, and 24 femoral plaques) and 28 controls (left internal thoracic artery), and the Maastricht Pathology Tissue Collection, 26 representing atherosclerotic carotid artery segments from 13 lesions at the early intimal thickening/ xanthoma stage and from 16 advanced fibrous cap atheroma lesions. Altogether, 10 out of the 13 disease state gene signatures separated the diseased samples from the controls (Figures 2B and 2C). The Palldþ SMC and Calcrlþ EC markers were only induced in the aortic plaques, compared to femoral or carotid plaques or control arteries, suggesting vascular-bed-specific differences. These results demonstrate similarities between the atherosclerotic cell states in mouse and human and support a potential role for these genes in disease etiology.

Identification of cell-state-specific and shared pathways among gene signatures
The maintenance and transition of cellular states are controlled by environmental signals that translate into gene regulatory mechanisms and biological processes. To investigate the similarity of the biological process activities of the atherosclerosis-associated cell states, we next evaluated the uniqueness of the cell state markers and their associated gene ontologies. Comparison of the marker gene sets of each disease-associated cell state demonstrated that the large majority (71%; 1,516/2,122) of the signature genes were unique to one cell state ( Figures 3A and 3B). Still, 12 genes were common to five or more cell states, as exemplified by matrix Gla protein (Mgp), stem cell antigen-1 (Sca1; also called Ly6A) and legumain (Lgmn), syndecan 4 (Sdc4), and insulin-like growth factor binding protein 4 (Igfbp4), and could thus represent global markers of atherosclerosis.
Comparison of marker gene lists at the level of gene ontology allowed to further shed light into the phenotypes of these cell states (Figures 3C and 3D; Table S4). For example, Vcam1 itself and the other associated cell state markers participating in ECM remodeling have been previously iden-tified as hallmarks of SMC-derived intermediate cells, 4 also called fibromyocytes. 3 These cells have been suggested to differentiate into chondrocyte-like cells, 2,5 i.e., fibrochondrocytes, 4 which correspond to the Col2a1þ SMC state. This gave us a unique opportunity to model a three-state continuum for the SMCs (Myh11þ -Vcam1þ -Col2a1þ) not evident for the other cell types. The hypothesis proposing that contractile SMCs can undergo transdifferentiation into Vcam1þ SMCs and Col2a1þ SMCs was further corroborated by the results of pseudotime trajectory analysis ( Figure S15). Among ECs, the Lrg1þ ECs expressed markers of endothelial-to-mesenchymal transition, such as Sox4, 50 Tubb3, 51 and Fbln5, 52 in line with a recent report. 8 On the other hand, Calcrlþ EC state genes were indicative of regulation of endothelial function by mitochondrial reactive oxygen species (electron-transport chain) and shear stress (e.g., Calcrl, Klf4, Pecam1, Tek [Tie2], Jun, and Fos). 7 Our analysis also uncovered the phenotypes of the five atherosclerosis-associated immune cell states as similar to lipid-associated Trem2þ MPs (Spp1þ MP1), proinflammatory MPs (Ccl4þ MPs), proliferating MPs (Stmn1þ MPs), 1 Nkg7þ Cytotoxic CD8 T cells, and Ramp3þ antigen-specific CD4 T cells. 53,54 Despite extensive cell state specificity of the marker gene sets, we also identified shared biological processes between some cell states. Gene ontologies related to ECM organization, blood vessel development, endodermal cell differentiation, and angiogenesis were enriched among the shared genes of Vcam1þ SMC, Col2a1þ SMC, and Lrg1þ EC states, whereas enrichment for cholesterol transport and homeostasis genes was shared between Vcam1þ SMC, Col2a1þ SMC, and Spp1þ MP states ( Figures 3C and 3D). Taken together, these results provide evidence that the disease-associated cell states largely contribute to specific cellular functions in the atherosclerotic niche with, however, important overlapping activities related to vascular development, angiogenesis, and lipid metabolism.
Disease states share similar upstream regulators but differ in predicted TF activities We next sought to investigate how the gene signatures of disease-associated cell states could be defined by the cellular microenvironment by inferring intercellular communications and intracellular signaling networks. We first looked for expressed ligands and receptors and modeled their gene regulatory effects in putative ligand-receiving cells for the five most abundant disease states by using the NicheNet tool 20 ( Figure 4A). Ligands belonging to the transforming growth factor b (TGFb), inflammation (TNF, IL-1a/b), angiogenesis (VEGFA, FGF1/2), and cholesterol pathways (ApoE) were identified as the most probable ligands to give rise to the gene sets, suggesting extensive similarities in the microenvironmental signals underlying cell-state-specific gene expression (Figures 4A and 4B). Among them, Tnf and Il1a/b were mostly expressed by the lipid-associated Spp1þ and proinflammatory Ccl4þ MPs, suggesting extensive auto-and paracrine proinflammatory signals originating from myeloid cells ( Figure 4C). Importantly, these proinflammatory ligands were predicted to regulate several cell state genes specific for Vcam1þ and Col2a1þ SMCs ( Figure 4B). To test the validity of these computational predictions, we stimulated mouse SMCs in vitro by using the top-ranked ligand IL-1b for 24 and 48 h and modeled the pseudotemporal trajectory of the gene expression response on the basis of scRNA-seq. Our results demonstrated that the genes upre-gulated along the in vitro SMC IL-1b response trajectory were induced along trajectory of in vivo SMC disease-associated transition and vice versa, supporting the predictions ( Figure S16). Finally, Col2a1þ chondrocyte-like cells were predicted to be the main producers of TGFb1, VEGFA, and BMP2, highlighting their potential role in a cell-cell signaling network of autocrine osteochondrogenic signaling and paracrine signaling driving the fibrogenic and angiogenic gene expression programs in Vcam1þ SMCs and Lrg1þ ECs (Figures 4D and S17).
The predicted similarities in the microenvironmental signals suggest that the differences in disease-associated gene signatures could arise as a result of differential response of the cell types to the same stimulus and thus cell-state-specific transcription factor (TF) activity. To investigate this, we inferred TF regulatory networks by single-cell regulatory network inference and clustering (SCENIC) analysis for the three major cell lineages (SMCs, ECs, and MPs; Figure 5A). 21 First, we predicted the transcription factors for the Palldþ, Vcam1þ, and Col2a1þ SMC-derived cell states ( Figure 5B). The Palldþ cell state gene expression profile was predicted to be driven by proinflammatory TFs such as the NF-kB, AP-1, and Maf family members. In line with the predicted differentiation trajectory of the Vcam1þ and Col2a1þ cell states from contractile SMCs, the SCENIC predictions supported a gradient of TF activity (Figures 5E and S15C) and evolution of gene networks. CEBPD and RUNX1 were identified as the key regulators for both atherosclerosis-associated cell states despite the higher expression of Cebpd in Vcam1þ SMCs and Runx1 in Col2a1þ SMCs. In addition, the transition from Vcam1þ cell state to Col2a1þ state was predicted to be driven by the ER-Golgi stress transducer CREB3 family members 55 as well as EndMT associated HAND2 and SNAI1 TFs. 56 TFs induced by inflammation and metabolic stress, including XBP1, ATF4, RARG, and NFATC1/2, were identified as key drivers of the Col2a1þ state.
In line with marker gene expression, SOX4 was predicted as the main driver of the Lrg1þ EC state, followed by XBP1, TCF4, CEBPB, IRF3, CREB3L2, and ELK3 ( Figures 5C and  5E). The analysis was unable to identify candidate regulators for Calcrlþ EC state. Interestingly, XBP1, TCF4, CEBPB, CREB3L2, and ELK3 were also identified as key regulators of the Vcam1þ and Col2a1þ SMC states, suggesting a key role of these TFs across cell types. As expected, the inflammatory Ccl4þ MPs showed enrichment of proinflammatory TF motifs, such as STAT, IRF, and AP-1 factors, whereas the Spp1þ lipid-associated MPs were enriched for SREBF1, USF1, NR1H3, and the ER stress response TFs DDIT3 and XBP1, which all have been implicated in lipid homeostasis 57 (Figures 5D and 5E). The proliferating Stmn1þ MPs were enriched for TF motifs implicated in cell cycle regulation, as exemplified by the members of the E2F TFs and EZH2. Among the TFs identified, XBP1, MAFB, MLX, and NFE2L2 (also called NRF2) were also identified as drivers of the Vcam1þ and Col2a1þ SMC states. Overall, we conclude that the majority of the single-cell gene regulatory networks are cell state specific and could provide critical insights into essential factors driving the progression of atherosclerosis.
Interrogation of GWAS loci and partitioning the heritability of CAD highlights the importance of smooth muscle cell states GWASs have identified over 300 risk loci for CAD. 9,10,29 Nonetheless, for a vast majority of the loci, the causal gene(s) underlying the association are not known with cer-tainty. In the last few years, however, several new methods have emerged that, on the basis of genomic proximity, protein-coding variants, variant association to gene expression (QTL, TWAS), enhancer-gene maps, or similarity in gene functions or pathways, establish links between risk loci and genes, 9,28,38,39 providing hundreds of candidate causal genes. We made use of these candidate causal gene lists to compute their enrichment among the cell state gene sets by using the hypergeometric test with expressed genes (>1 TPM in any aortic cell type or state; 14,902 genes) as the background. Our results demonstrate that 11 of the 12 cell states included genes prioritized as candidate causal genes, with the highest enrichment detected for the three SMC state (Vcam1þ, Col2a1þ and Palldþ) and Lrg1þ EC state genes ( Figures 6A and S18). This enrichment trend was also evident when the marker gene lists were truncated to a specific number of top genes ( Figure S19). In a further analysis, we equalized the number of marker genes for each cell state across a larger range (top 25 to 500 genes) by also including sub-threshold marker genes where needed. These results ( Figure S20 confirmed SMC (Vcam1þ, Col2a1þ, and Palldþ) and EC (Lrg1þ) states as the most enriched for CAD-GWAS-prioritized genes, including when measured by overlap ratio.
To further investigate the contribution of atherosclerosis-associated cell state signatures to the SNP-based heritability of CAD, we applied the LDSC tool 40 to partition heritability to gene sets by using CAD GWAS summary statistics. In our analysis, CAD heritability was most significantly enriched in regions surrounding Vcam1þ SMC state genes, followed by Col2a1þ and Palldþ SMC states, Spp1þ and Stmn1þ MPs, and Lrg1þ ECs ( Figure 6B). Our results demonstrate significant enrichment of SMC cell state genes over all other cell types of the lesions, which could help in functionally interpreting the GWAS signal.
Cell-state-and pathway-specific PRSs provide insight into the biological mechanisms of CAD risk An emerging body of evidence has shown that aggregation and weighing of CAD-associated variants into PRSs can improve an individual's risk prediction beyond traditional risk factors and provide an opportunity to identify novel mechanisms influencing CAD risk. 58 GWAS results could be considered a composite of signals corresponding to CAD-relevant processes encoded by different genomic regions and biological pathways. We therefore sought to test how much polygenic risk for CAD is influenced by the cell state, cell type, and gene ontology pathways by aggregating the risk alleles across the respective gene sets. To study this, we employed the PRSice-2 extension PRSet, 43 which performs LD clumping in a region setaware manner, generating PRSs that preferentially include SNPs falling into regions of interest. We used the GWAS summary statistics from the CARDIoGRAMplusC4D 44 study as the basis for PRS generation, and we used the UK Biobank cohort (21,600 CAD cases and 359,254 controls) to evaluate PRS performance.
We first applied PRSet to cell state marker gene sets (gene body with À35 kb and þ10 kb flanks). To evaluate PRS performance, PRSet computes the amount of phenotypic variance explained by the PRS, defined as the increase in model R 2 when PRS is included, compared to a null model consisting of covariates only. Further, permutation-based significance testing is carried out, comparing the performance for a PRS to identically clumped SNP sets from background regions. 43 We observed that PRS derived from the Vcam1þ SMC state genes explained greatest CAD risk variance and was strongly enriched in the predictive power relative to background, followed by Col2a1þ SMCs and Lrg1þ ECs ( Figure 7A). Importantly, the cell state genes outperformed the corresponding cell type marker genes in predictive performance ( Figure S21). Truncating the cell state marker gene sets to a specific number of top genes revealed that performance does tend to increase with the number of genes; however, the increase appears more rapid for some cell states than others, and permutation-based p value (i.e., predictive power relative to background SNP sets) tends to plateau for several cell states ( Figure S22).
To investigate how marker genes of non-plaque cell types perform in a similar analysis, we derived marker sets (Table S5) for 79 human cell types across the body by using the gene expression profiles compiled from scRNAseq datasets. 59 As an evaluation of the marker sets, related cell types showed more pairwise shared markers ( Figure S23), as expected, and testing the 79 cell type marker sets (top 500 genes each) for overlap with CAD-GWAS-prioritized genes revealed the strongest enrichment in SMC and EC, followed by FB and adipocytes, while most cell types showed no significant enrichment ( Figure S24). Next, we constructed gene set-based PRSs for CAD by using between 20 and 500 top markers for each cell type (gene body regions þ flanks) and evaluated PRS R 2 . Out of the 79 cell types, hepatocytes ranked highest, followed by SMCs, with adipocytes and ECs also among the top ten cell types ( Figure S25). Several other cell types in the top ten tend to be ones that share markers with SMCs ( Figure S23).
Pathway-based PRSs have been suggested to better inform disease biology compared to classical PRSs. 43,58 We therefore also performed PRS analysis on the gene ontologies that were significantly enriched among the cell state signature genes ( Figures 2C and 2D). For each of the significant ontology categories ( Figures 2C and 2D), we collected all human genes assigned to the ontology and used these to create a CAD PRS. Interestingly, the PRSs derived from functional categories that were shared among several cell states ( Figures 2C and 2D), including cholesterol transport, lipid localization, blood vessel development, and angiogenesis gene sets, were among the top in predictive performance and enrichment significance ( Figure 7B). In addition, enzyme-linked and growth factor receptor signaling, cell division, leukocyte differentiation, and cell cycle, were among the top ten categories associated with CAD risk. Altogether, our analysis suggests pivotal roles for disease-associated cell states and pathways as mediators of the genetic risk for CAD.
A great majority of the genetic variants associated with CAD are located within non-coding elements of the genome where they are thought to play a role in gene expression regulation. In line with this, inclusion of functional annotations of genomic and epigenomic elements has been shown to improve the prediction accuracy of PRSs. 60 Therefore, we derived a next set of PRSs prioritizing for variants in plaque cell type regulatory elements, represented by scATAC-seq peaks. 11 Using scATAC peaks in range TSS 5 500 kb for cell state marker genes resulted in PRSs consisting of similar numbers of variants as the gene coordinate-based PRSs above, allowing direct comparison of power. Notably, PRS performance was considerably improved when we used the regulatory elements as functional priors, with $2-fold increase in R 2 (Figures 7A versus  7C). In line with gene coordinate-based analysis, Vcam1þ SMC state gene continued to outperform other cell states.
As the regulatory element-based analysis appeared to inform SNP selection, we further tested scATAC peak-based PRSs genome wide (i.e., without gene set limitations). Out of the cell types studied, SMC scATAC-based PRS performed the best, although the combined PRS from all plaque scATAC cell types outperformed it ( Figure 7C). This was not dependent on the total peak counts, as selecting the strongest 10,000 cell type unique peaks reproduced similar results ( Figure S26). At its optimal p value threshold, the standard (classical) genome-wide PRS outperformed the scATAC PRSs. However, the p value threshold curves for scATAC and standard PRSs were differently shaped, and the scATAC PRS continued to gain power even at weaker p value thresholds whereas the standard PRS lost power ( Figure 7D). Based on this observation, we constructed a hybrid PRS where all strong p value SNPs were included irrespective of scATAC data, and scATAC was only used for weaker p value variants. This hybrid PRS outperformed the classical PRS at its optimal p value threshold ( Figure 7D).
Because other tissues such as liver and adipose tissue have been associated with the risk of CAD, we also generated genome region-based PRSs for CAD by using the recent scATAC atlas of 30 adult human tissues, consisting Figure 7. Pathway-and cell-state-specific polygenic risk scores shed light into the genetic basis of CAD (A) Cell-state-specific PRS was constructed with (A) gene coordinates (À35 kb upstream to 10 kb downstream) using PRSet. 43 To obtain the empirical p value, random SNP sets containing the same number of post-clumping SNPs were selected from background regions of the genome, selected from all genic regions. (B) Explained variance of each pathway-specific PRS to polygenic risk of CAD calculated for the gene sets listed in Figure 3C.
(C) Cell-state-specific PRS analysis constructed with plaque scATAC-seq peak coordinates that were found within 5500 kb of the TSS.
(D) Proportion of variance of CAD explained by PRS in genome-wide analysis. The values represent PRS calculated for all cell-type-specific scATAC-seq peaks at different p value thresholds, which are compared to the classical genome-wide clumping and thresholding PRS.
of 111 cell types and approximately 890,000 peaks. 61 To confirm our cell type marker peak selection, we generated a pairwise marker sharing matrix, which revealed the expected similarity patterns between related cell types ( Figure S27). In line with our plaque scATAC-based PRS analysis, SMC-related cell types explained the greatest CAD risk variance, followed by adipocytes, ECs, and fibroblasts. Notably, cardiomyocytes, hepatocytes, and immune cells scored considerably lower in R 2 -based ranking irrespective of the total peak count used (peaks selected by ATAC signal strength; Figures S28 and 29). The cell type ranking was similar for peaks selected by cell type specificity ( Figure S30).

Discussion
By performing a cell-type-unbiased time-course analysis of single cell transcriptomes in mouse atherosclerotic aorta, we identified 12 disease-associated cell states. Majority of these states are concordant with the previous studies focusing on specific subtypes of cells and lineage-tracing experiments. [1][2][3][4][5][6][7][8]53,54 In addition, we identified four less abundant disease states corresponding to Calcrlþ ECs, Smoc2þ FBs, Apoeþ MSCs, and Palldþ SMCs that require further experimental validation. Specifically, these cell states were not activated in Ldlr À/À and Apoe À/À mouse models to the same extent and could also represent states that are specific to certain aortic vascular beds. Still, our enrichment-free setup allowed us to provide the first comparison of shared and differential gene regulatory mechanisms underlying the atherosclerosis-associated cell states.
Despite that the majority of marker genes were specific to a disease state, a handful of genes were shared between five or more states and thus represent potential candidates for global biomarkers of CAD. To this end, circulating levels of LGMN, MGP, SDC4, and IGFBP4 have been associated with atherosclerosis and acute cardiovascular events with potential prognostic or risk stratification value. [62][63][64][65][66] For example, levels of MGP and IGFBP4 could reflect differential vascular calcification burden and highlight differences in plaque pathobiology between ST-segment-elevation myocardial infarction (STEMI) and non-ST segment elevation myocardial infarction (NSTEMI). 66 Signals from the microenvironment can be transmitted into the intracellular gene expression programs through multiple layers of signal propagation including ligand-receptor interactions, signaling molecules, and transcription factors. Based on the predicted ligand-receptor activities and downstream gene expression changes, our analysis strongly suggests that TGFb, IL-1a/b, TNF, VEGFA, FGF1/ 2, APOE, and SPP1 signaling pathways dominate in the atherosclerotic microenvironment and are shared as upstream inducers of many cell states. This suggests that the distinct gene content of the disease-associated gene signatures is a result of cell-type-specific responses to the same environmental stimuli. This is in line with the recent report by us and others where proatherogenic stimulus-induced gene expression responses in ECs, SMCs, and MPs appeared very cell type specific. 67,68 Supporting this, we demonstrate that cell states are largely governed by specific TFs that form intricate gene regulatory networks. Still, a few TF modules were shared between the SMC and EC cell states (XBP1, TCF4, CEBPB, CREB3L2, and ELK3) and between SMC and MP cell states (XBP1, MAFB, MLX, and NRF2), identifying them as potential candidates driving the expression of the genes in shared ontology categories related to ECM organization, blood vessel development, angiogenesis, cholesterol transport, and lipid localization. Indeed, XBP1, a key modulator of unfolded protein response, has been demonstrated to play an important role in the regulation of lipid metabolism and angiogenesis and the inhibition of this pathway alleviates atherosclerosis. 57,69,70 Analysis of the atherosclerotic cell state signatures in human lesions supported the relevance of the majority of the cell states in separating healthy from diseased samples or early disease from advanced disease samples. But in cases where the cell state gene sets were short, the power of sample classification was not apparent. Supporting the relevance of the disease signatures, we further demonstrated that many of the cell state marker genes were predicted target genes of CAD GWAS variants and the genome regions where the marker genes reside contribute significantly to the heritability of CAD. Importantly, the genetic variation in the Vcam1þ SMCs and Col2a1þ SMCs contributed to CAD heritability beyond other cell states, suggesting that these cell states are particularly important in understanding the pathobiology of atherosclerosis. This is also supported by other recent studies looking into the cell-type-specific expression of predicted GWAS target genes. 71,72 Importantly, our analysis demonstrated that atherosclerosis-associated cell state markers explain a larger proportion of CAD risk variance compared to cell type markers, suggesting a more prominent role for genes that participate in pathological changes compared to those that maintain healthy cell identity.
Substantial ongoing efforts are looking into applying CAD PRS in the clinical practice including risk stratification and prediction of treatment response. By enhancing early prediction of CAD beyond traditional risk factors, PRS could also guide new treatment strategies. This is exemplified by several landmark studies that demonstrated that the individuals at high genetic risk for CAD experience the greatest benefit from lipid-lowering treatments. [73][74][75] However, the standard PRS sums an individual's genetic profile to a single estimate that may fail to identify more nuanced phenotypes that are necessary for risk stratification, prediction of treatment response, or identification of pathways leading to novel treatment. 43 Here, we evaluated the performance of PRS, which accounts for genomic substructure/regional functional heterogeneity by aggregating risk alleles across cell states/types and biological pathways. Our results demonstrate that while the regular (genome-wide) PRS outperforms the cell state gene set-based PRSs in absolute accuracy, regulatory element-based PRS effectively quality filters the SNPs with weaker GWAS p value. This allowed us to construct a hybrid PRS that achieved higher power than the regular PRS at its optimal p value threshold. On the other hand, pathway-based PRS identified biological processes conferring higher genetic risk, providing pathway-level processes to target in drug design. In particular, the pathways shared between several cell states related to cholesterol transport, lipid localization, extracellular matrix, blood vessel development, and angiogenesis could provide actionable targets. While our analysis provides the first steps toward cell-state-, cell-type-, and pathway-level understanding of the genetic risk, future studies are needed to evaluate whether incorporation of such functional prior information improves polygenic prediction accuracy in individual stratification or prediction of treatment response.
In summary, we provide in depth characterization of atherosclerosis-associated cell states and demonstrate the value of cell-state-specific markers in understanding the genetic basis of CAD. Substantial work still needs to be done to functionally validate the role of these genes in the pathophysiology of atherosclerosis. Defining the mechanisms that contribute to distinct cell states in pathological conditions could provide a basis for applying precision medicine and targeted therapies in the future.

Data and code availability
The accession number for the mouse bulk RNA-seq and scRNA-seq reported in this paper is GEO: GSE205929 and GSE205930. The MOVAS count matrices are available at FigShare DOI: https://doi.org/10.6084/m9.figshare. 20059649.v1. The Maastricht Pathology Tissue Collection data were downloaded under GEO accession number GEO: GSE28829. Previously published mouse and human atherosclerosis scRNA-seq datasets were obtained from GEO accessions GEO: GSE155513 and GSE131778, respectively.