Single-cell multiome of the human retina and deep learning nominate causal variants in complex eye diseases

Summary Genome-wide association studies (GWASs) of eye disorders have identified hundreds of genetic variants associated with ocular disease. However, the vast majority of these variants are noncoding, making it challenging to interpret their function. Here we present a joint single-cell atlas of gene expression and chromatin accessibility of the adult human retina with more than 50,000 cells, which we used to analyze single-nucleotide polymorphisms (SNPs) implicated by GWASs of age-related macular degeneration, glaucoma, diabetic retinopathy, myopia, and type 2 macular telangiectasia. We integrate this atlas with a HiChIP enhancer connectome, expression quantitative trait loci (eQTL) data, and base-resolution deep learning models to predict noncoding SNPs with causal roles in eye disease, assess SNP impact on transcription factor binding, and define their known and novel target genes. Our efforts nominate pathogenic SNP-target gene interactions for multiple vision disorders and provide a potentially powerful resource for interpreting noncoding variation in the eye.


Correspondence howchang@stanford.edu
In brief Wang et al. present a joint single-cell atlas of gene expression and chromatin accessibility of the adult human retina, which they use to analyze risk variants from genome-wide association studies of five eye diseases. They integrate this atlas with chromatin conformation data, expression quantitative trait loci, and deep learning models to nominate gene and cellular targets in the retina for hundreds of noncoding variants implicated in vision disorders.

INTRODUCTION
Genome-wide association studies (GWASs) of eye disorders, such as glaucoma, myopia, and age-related macular degeneration (AMD), have uncovered hundreds of genetic polymorphisms associated with ocular disease. [1][2][3][4][5] However, the vast majority of variants identified by GWASs reside in noncoding regions of the genome, making it challenging to interpret their function. 6 To better understand how noncoding variants mechanistically contribute to ocular pathology, it would be valuable to map in which cell types their corresponding loci are active. This information would provide novel insights into the cellular biology of genetically complex eye diseases and help nominate specific cell types as targets for therapies.
A recent advance in studying the noncoding genome has been the development of single-cell multiomics technologies, such as paired single-cell RNA sequencing (scRNA-seq) and singlecell assay for transposase-accessible chromatin sequencing (scATAC-seq). While scRNA-seq can classify the different cell types of a tissue based on their transcriptional profiles, its combination with scATAC-seq allows for the additional mapping of cell-type-specific chromatin accessibility. Together, these techniques can reveal the activity of noncoding DNA elements identified by GWASs and have been used to interrogate risk variants for conditions such as Alzheimer disease, Parkinson disease, autism spectrum disorder, and autoimmunity. [7][8][9] Investigations of the noncoding genome have likewise benefitted from analytical innovations, such as application of convolutional neural network (CNN)-based deep learning to predict the effects of noncoding polymorphisms. [10][11][12] Progress in this area has recently led to models with resolution down to a single nucleotide, enabling accurate determination of the critical bases within cis-regulatory sequences. 8,12,13 These models offer a validated approach to prioritize noncoding variants with functional relevance and are particularly suitable for tissues in which experimental manipulation is difficult.
Here we generated a joint scRNA-and scATAC-seq atlas of the adult human retina composed of more than 50,000 cells and used it to analyze single-nucleotide polymorphisms (SNPs) implicated by GWAS of five eye diseases: AMD, glaucoma, diabetic retinopathy (DR), myopia, and type 2 macular telangiectasia (MacTel). Layering this atlas with a HiChIP enhancer connectome, 14 expression quantitative trait locus (eQTL) data, 15 and base-resolution deep learning models, 12 we predicted noncoding SNPs with causal roles in eye disease. Our efforts nominate pathogenic SNP-target gene interactions for multiple vision disorders and provide a potentially powerful resource for interpreting noncoding variation in the eye.

RESULTS
Single-cell multiomics reveal the gene expression and chromatin accessibility landscapes of cell types in the human retina To generate a single-cell multiome of the human retina, we performed joint scRNA-and scATAC-seq profiling on eight postmortem retinas from four individuals who had no history of eye disease (Table S1). After quality control filtering ( Figure S1) and removal of putative doublets ( Figures S2A and S2B), we obtained a total of 51,645 human retinal cells in 22 clusters that we assigned to 13 different cell types ( Figures 1A, 1B, and S2C). These included abundant cell types, like rod photoreceptors and M€ uller glia, as well as rarer cell types, such as astrocytes and microglia, which each constituted only 0.4% of profiled cells ( Figure 1C; Table S2). Consistent with published scRNA-seq studies of the human retina, [16][17][18][19] we observed cell-type-specific expression of many genes, including PDE6A in rod photoreceptors, GRIK1 in OFF-cone bipolar cells, RLBP1 in M€ uller glia, GRM6 in ON-cone and rod bipolar cells, PRKCA in rod bipolar cells, ARR3 in cone photoreceptors, GAD1 in GABAergic (GABA-) amacrine cells, ONECUT1 in horizontal cells, SLC6A9 in AII-and other glycinergic (gly-) amacrine cells, NEFL in retinal ganglion cells, GJD2 in AII-amacrine cells, GFAP in astrocytes, and C1QA in microglia ( Figure 1D and S3A). We also identified a list of candidate marker genes based on differential expression for each of the 13 cell types (Data S1).
Using shared barcodes from joint multiomics profiling, we next assigned scATAC-seq profiles to the 13 cell types characterized by scRNA-seq. The resulting chromatin accessibility profiles clustered by cell type ( Figure S4A) and were highly similar to those from a public human retina scATAC-seq dataset ( Figure S3B), supporting the validity of our multiome. Peak calling performed on scATAC-seq profiles from each cell type combined into pseudobulk ATAC replicates uncovered a total of 620,386 chromatin accessibility peaks (Figure 2A; Data S2). These scATAC peaks were concordant among donors ( Figures S4B and S4C) and included more than 90% of peaks from a published bulk ATACseq study of the human retina (Figures 2B), 20 indicating that single-cell multiomics can recapitulate bulk ATAC-seq data. Conversely, more than half of the scATAC peaks were unique to the single-cell dataset ( Figure 2B), and nearly 40% of scATAC peaks were accessible in only one cell type ( Figure S5A). In line with this, we found 197,826 scATAC marker peaks enriched in a cell-type-specific manner ( Figure 2C; Data S3), including many located near cell-type-specific genes ( Figure 2D).
With these scATAC peaks, we conducted a motif enrichment analysis to predict which transcription factors (TFs) might be active in each cell type ( Figure 3A; Data S4). In accordance with published literature, we observed enrichment of binding motifs for TFs with known cell-type-specific functions, such as OTX2 in photoreceptors and bipolar cells, 21 ONECUT family members in horizontal cells, 22 POU4F family members in retinal ganglion cells, 23 and SPI1 (PU.1) in microglia. 24 For some TFs, cell-type-specific activity was also supported by footprinting analysis of scATAC peaks ( Figure 3B), which revealed motif centers to be protected from Tn5 transposition, consistent with TF occupancy. These data offer a cell-type-specific catalog of candidate TFs in the adult retina and may aid our understanding of gene-regulatory networks controlling vision.
Single-cell multiomics uncover the cellular contexts of variants implicated by ocular disease GWASs With our single-cell multiome, we sought to better understand risk loci identified by GWASs of complex eye disorders. To this end, we compiled a list of 1,331 unique index SNPs from the NHGRI-EBI GWAS Catalog representing GWAS hits for five eye diseases: AMD, glaucoma, DR, myopia, and MacTel (Figure 4A; Table S3). 25 The vast majority (96.5%) of these SNPs localized to noncoding regions of the genome and, thus, could not be interpreted with scRNA-seq data alone. We performed linkage disequilibrium (LD) expansion on all index SNPs to include nearby variants with high probability of coinheritance (LD R 2 > 0.9 based on phase 3 genotypes from the 1000 Genomes Project) ( Figure S5B). 26 From this, we obtained a total of 7,100 SNPs in loci associated with complex eye disorders, 7,034 (99.1%) of which were noncoding.
We first examined the small number of LD expanded SNPs that encoded an amino acid change or premature stop codon (Data S5). Using our scRNA-seq data, we found that coding variants implicated in AMD and glaucoma were most strongly expressed in microglia and M€ uller glia, respectively (Figure S6A). As a comparison, we also assessed the cell type expression of mutations known to cause monogenic retinal diseases based on gene annotations from the Retinal Information Network ( Figure S6B). 27 Consistent with expectations, genes for primary photoreceptor disorders, such as cone-rod dystrophy and retinitis pigmentosa, were most highly expressed in rods and cones, whereas disease genes for optic atrophy, a condition hallmarked by degeneration of retinal ganglion cells, showed the greatest expression in this cell type.
We next turned to the 7,034 unique SNPs after LD expansion that localized to the noncoding genome (Data S6). To determine in which retinal cell types these SNPs might be active, we overlapped SNP locations with scATAC peaks from our dataset. We found that 1,152 noncoding SNPs (16.4%) overlapped with a scA-TAC peak ( Figure 4B, S5C, and S5D) and that most SNP-containing peaks were present in only one or two cell types ( Figure S5E). We then conducted two orthogonal analyses to refine our list of SNPs for those more likely to possess gene regulatory functions. First we identified SNPs in scATAC peaks that were co-accessible with peaks in promoter regions, reasoning that this would select for SNPs in active enhancers. We detected 39,552 such promoter peaks in the human retina, 58.3% of which were co-accessible with at least one scATAC peak ( Figure 4C). Leveraging our paired scRNA-and scATAC-seq data, we also searched for SNPs in peaks whose accessibility correlated with the expression of a nearby gene and termed genes that met this criterion ''predicted target genes.'' Using this method, we predicted target genes for 199,055 (32.1%) of the 620,386 scATAC peaks in our dataset ( Figures 4D and S5F). For nearly half (44.3%) of these peaks, our predictions differed from the nearest gene on the linear genome ( Figure S5G), suggesting that noncoding SNPs do not necessarily regulate their nearest gene.
We identified 241 SNPs in scATAC peaks that were co-accessible with promoter peaks and 374 SNPs that had predicted target genes, with 202 SNPs meeting both criteria. As an example, we examined rs4821699 residing in an intron of TRIOBP on chromo-some 22. This locus has been implicated in glaucoma by multiple GWASs and encodes a protein thought to regulate cytoskeletal organization. 2,28,29 We observed that rs4821699 was most accessible in retinal ganglion cells ( Figure 4E), the major cell type that undergoes degeneration during glaucoma. Based on correlations with gene expression, the peak containing this SNP was predicted to target TRIOBP. We hypothesize that rs4821699 might therefore play a role in glaucoma by altering TRIOBP expression in retinal ganglion cells.

OPEN ACCESS
A handful of SNPs associated with eye diseases have been studied experimentally using retinal organoids derived from induced pluripotent stem cells. One such SNP is rs17421627, an index SNP from GWASs of MacTel, representing a T-to-G substitution on chromosome 5. 5, 30 We determined rs17421627 to be one of only five SNPs for MacTel with a predicted target gene and found the SNP to be most accessible in M€ uller glia and astrocytes ( Figure 4E). Using linked gene expression data, we also predicted rs17421627 to act on LINC00461, a long noncoding RNA. Consistent with these predictions, deletion of the locus containing rs17421627 in human retinal organoids has been shown to significantly downregulate LINC00461, with the strongest effect in M€ uller glia. 31 These examples illustrate how single-cell multiomics can reveal the cellular targets of noncoding variants in the retina and show how they might contribute to eye disorders.
Integration of the single-cell multiome with HiChIP and eQTL data validates SNP-target gene predictions To further prioritize our list of SNPs, we combined our data with two complementary methods for identifying functional SNP-gene  S7B). 14, 33 We uncovered 16,692 loop anchors connected by 9,670 HiChIP loops, including several linking regions of chromatin accessibility to the transcription start sites (TSSs) of cell-type-specific genes ( Figure S7C; Data S7). Of these loops, more than 95% overlapped with a scATAC peak in both anchors, and more than 99% overlapped with a peak in at least one anchor ( Figure 5A). This result shows that accessible chromatin sites identified in scATAC-seq data possess biochemical characteristics of active enhancers and supports their connection to target genes. We also analyzed our list of SNPs using published human retina eQTL data from the Eye Genotype Expression (EyeGEx) database. 15 For more than 90% of SNPs in scATAC peaks, retina eQTL data were available ( Figure 5B), enabling genes whose mRNA expression in the human retina changed with specific SNPs to be identified at the bulk tissue level. We found 187 disease-associated SNPs in scATAC peaks that were linked to a gene by a H3K27ac HiChIP loop. These included rs9966620, the top SNP from a GWAS of DR representing a G-to-A transition in an intron of TTC39C on chromosome 18. 34 Using our multiome, we determined that the scATAC peak con-taining rs9966620 was most accessible in rods ( Figure 5C). However, this peak also correlated with the expression of multiple target genes, hampering efforts to interpret how the SNP might function. Incorporating our HiChIP data, we were able to locate a 3D loop connecting rs9966620 with a region 75 kb upstream. This region intersected the TSS of only one gene, TTC39C-AS1, suggesting that rs9966620 may modulate DR risk by interacting with TTC39C-AS1 in rods.
We also detected 596 disease-associated SNPs in scATAC peaks that were significantly associated with a gene by eQTL analysis. One example is rs2730260, an SNP in an intron of VIPR2 that has been implicated in myopia. 35 This locus encodes one of two known receptors for vasoactive intestinal peptide (VIP), a signaling molecule involved in visual processing. 36 We found that rs2730260 resided in a chromatin accessibility peak specific to M€ uller glia that again had multiple predicted target genes ( Figure 5C). This ambiguity was clarified by retina eQTL data, which showed that variation at rs2730260 significantly correlated with the expression of only VIPR2 ( Figure 5D), supporting this gene as the SNP's primary target. Integration of eQTL data similarly improved our interpretation of rs66475830 on chromosome 6 in the FRK-NT5DC1-COL10A1 risk locus for AMD. 37  determined rs66475830 to be accessible in amacrine and horizontal cells and predicted TSPYL1 and TSPYL4 as target genes (Figure 5C). Retina eQTL analysis revealed that variation at this position was significantly associated with TSPYL4 expression but not that of other nearby genes ( Figure 5D), nominating TSPYL4 as the effector gene of rs66475830.
Last, we identified many SNP-target gene relationships supported by HiChIP and eQTL data, such as rs77272443 and rs4102217 located in risk loci for myopia and glaucoma, respectively. 39,40 For both of these SNPs, HiChIP and eQTL analyses again refined target gene predictions ( Figures S8A and  S8B), demonstrating how the combination of single-cell   (C) Number of scATAC peaks co-accessible with each promoter peak. Co-accessible was defined as scATAC peaks whose accessibility showed a correlation score greater than 0.3. (D) Number of predicted target genes for each scATAC peak. Predicted target genes were defined as genes whose RNA expression showed a correlation score >0.3 relative to the accessibility of the tested scATAC peak. (E) Sequencing tracks of chromatin accessibility near rs4821699 (chr22:37719685) and rs17421627 (chr5:88551768). Genes in the sense and antisense directions are shown in red and blue, respectively. The location of each SNP is depicted by a vertical gray line. Gray arcs indicate predicted target genes for the scATAC peak containing the SNP of interest. Predicted per-base counts Müller glia (homeodomain TFs)   Integration of the single-cell multiome with baseresolution deep learning nominates functional mechanisms for disease-associated SNPs CNN-based deep learning models have proven capable of discerning disease-associated SNPs from other noncoding variants. 8,10,11 As a final method to prioritize SNPs in our dataset, we therefore trained CNNs derived from the BPNet architecture on scATAC-seq profiles for each of the 13 retinal cell types (Figure 6A, S9A, and S9B). 12 At each SNP region, we compared the projected per-base change in chromatin accessibility between reference and alternate alleles using models specific to the different cell types. These calculations allowed us to identify ''high-effect'' SNPs, which we defined as SNPs predicted to cause a statistically significant (false discovery rate < 0.01) absolute log2 fold change of allele-specific read counts of greater than 0.5 in local chromatin accessibility in any cell type. We found 23 SNPs (2.0%) residing in scATAC peaks that qualified as high-effect, a greater percentage than among index SNPs, LD expanded SNPs, random SNPs matched for GC content, and random SNPs residing in scATAC peaks ( Figure 6B; Data S8). One of the top-scoring SNPs was rs1532278, an index SNP associated with myopia and residing in an intron of CLU on chromosome 8. 3 Our atlas predicted rs1532278 to regulate CLU, a notion reinforced by eQTL data, and determined the SNP to be accessible in nine of 13 retinal cell types (Figures 6C and 6D). Despite this, base-resolution models projected a T-to-C transition at rs1532278 to alter chromatin accessibility only in M€ uller glia, specifically by disrupting the motif of a homeodomain TF. Our findings suggest that even though rs1532278 is accessible across multiple cell types, its functional effect in the retina might be restricted to M€ uller glia because of a cell-type-specific homeodomain TF. We speculate that this TF could be LHX2, given its robust expression in M€ uller glia by our scRNA-seq data ( Figure 6E) as well as data from animal models. 41 Another high-effect SNP was rs1874459 located in an intron of CDH11 on chromosome 16, a locus implicated by multiple GWASs for glaucoma. 2,28 Using our multiome, we found rs1874459 to be most accessible in rod bipolar cells and predicted CDH11 as one of its target genes, an idea supported by eQTL data (Figures 6C and 6F). Incorporating base-resolution models, we then determined that the G-to-C transversion represented by rs1874459 introduced a new basic-helix-loop-helix (bHLH) domain, which was expected to increase accessibility in rod bipolar, OFF-cone bipolar, ON-cone bipolar, gly-amacrine, and AII-amacrine cells. Of the bHLH TFs, members of the neuroD and neurogenin families in particular were predicted by motif analysis to be significantly enriched in these five cell types ( Figure 3A; Data S4). We thus compared all neuroD and neurogenin family members using our scRNA-seq data, which revealed only NEUROD4 to be specific to bipolar and amacrine cells ( Figure 6G), consistent with its role in specifying these cell types during development. 42,43 Our results suggest that rs1874459 may act on CDH11 in bipolar and amacrine cells by creating a new bHLH domain recognized by NEUROD4.

DISCUSSION
In this study, we applied single-cell multiomics, HiChIP, eQTL analysis, and base-resolution deep learning to the human retina to decipher the role of noncoding risk variants in five eye diseases. Integrating these methods allowed us to predict gene and cellular targets in the retina for hundreds of SNPs and nominate dozens as potentially pathogenic and meriting functional validation. From an initial list of more than 7,000 noncoding SNPs, we identified 1,152 located in chromatin accessibility peaks. We subsequently focused on SNPs (1) that were coaccessible with a promoter, (2) whose accessibility correlated with the expression of a nearby gene, (3) that were linked to a gene in 3D space by a H3K27ac HiChIP loop, (4) that demonstrated significant association with a gene based on retina eQTL data, and (5) that were predicted to alter local chromatin accessibility as determined by base-resolution models. We propose that SNPs meeting most or all of these criteria ( Figure S10; Data S6) be prioritized in future validation efforts.
Our findings build upon recent studies that used primarily fetal tissue and stem-cell-derived organoids to map cell-type-specific chromatin accessibility in the human retina. 31,44 Datasets from these studies are a rich resource for decoding retinal development and understanding congenital eye diseases. However, they might not fully recapitulate the biology of the mature retina, making them potentially less suitable for studying vision  = 1,152), randomly selected GCmatched SNPs (n = 9,984), and randomly selected SNPs in scATAC peaks (n = 1,160) that were categorized as high-effect. (C) Top: predicted per-base accessibility for rs1532278 (chr8:27608798) and rs1874459 (chr16:65041801) in M€ uller glia and rod bipolar cells, respectively, as determined by deep learning models. A 100-bp window depicts the importance of each base to predicted accessibility at the SNP, and a 1,000-bp window depicts predicted per-base counts for the reference (blue) and alternate (orange) alleles. SNP bases are highlighted in purple. For rs1874459, similar changes in accessibility were predicted for OFF-cone bipolar, ON-cone bipolar, gly-amacrine, and AII-amacrine cells. Bottom: sequencing tracks of chromatin accessibility near rs1532278 and rs1874459. Genes in the sense and antisense directions are shown in red and blue, respectively. The location of each SNP is depicted by a vertical gray line. Gray arcs indicate predicted target genes for the scATAC peak containing the SNP of interest. Resource ll OPEN ACCESS disorders that present later in life. Here we generated a singlecell multiome that complements prior datasets by pinpointing cellular targets for disease-associated SNPs in the adult human retina. We combined this multiome with multiple orthogonal analyses to define putative SNP-target gene interactions. By performing base-resolution deep learning, we were able to uncover insights not readily apparent from single-cell, HiChIP, and eQTL data, such as the predicted effect of SNPs on TF binding and the directionality of these effects. To facilitate its use, our atlas is publicly available at https://eyemultiome.su.domains/.
Limitations of the study Limitations of this study include the relatively low number of cells profiled from rarer cell types like microglia and retinal ganglion cells. Expansion of the current atlas with more cells and donors would be helpful to resolve functionally distinct cell subtypes and could reveal subtype-specific regions of chromatin accessibility that were overlooked here. During SNP prioritization, use of HiChIP and eQTL data from bulk retina may have also favored SNPs from more abundant cell types. Limited overlap of GWAS hits and significant eQTL because of systemic differences in study design may have led to lower concordance among our prioritization criteria. 45 Finally, it should be noted that the majority of SNPs we examined did not overlap with any chromatin accessibility peaks, suggesting that they were not active in the retina. We hypothesize that many of these unassigned SNPs may instead function in other parts of the eye and thus could not be captured by our analysis. For instance, although the neural retina is damaged in AMD and DR, the retinal pigment epithelium and vasculature, respectively, are thought to be the primary sites of pathology. 46,47 In glaucoma, the trabecular meshwork and ciliary body can modulate disease severity as evidenced by treatments that act on these tissues. 48 Likewise, the choroid and sclera may be involved in myopia, given that they elongate alongside the retina with increasing nearsightedness. 49 Multiomics characterization of these additional ocular regions would enable a more complete understanding of how noncoding SNPs contribute to vision disorders.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following: scRNA-seq data analysis scRNA-seq data from nuclei remaining after quality control filtering and automated removal of doublets were analyzed using Seurat (version 3.1.5). 53 After merging all preprocessed samples into a single Seurat object, gene expression counts were normalized using the NormalizeData function, scaled using the ScaleData function, and batch corrected using Harmony. 54 Graph-based clustering was then performed on the Harmony-corrected data using the top 20 principal components at a resolution of 0.5. Cluster identities were manually annotated based on the expression of genes from published scRNA-seq studies of the human retina. [16][17][18] Marker genes for each cluster were additionally identified using the FindAllMarkers function with a minimum fraction of 0.5 and a log2 fold change of 1 (Data S1). Clusters expressing canonical marker genes from different cell types were designated as putative doublets and excluded, after which re-clustering was performed using the same parameters. Clusters with no detected marker genes were also excluded, after which the dataset was also re-clustered. Clusters in the final dataset representing subpopulations of the same cell type were grouped together for downstream analyses.
scATAC-seq data analysis scATAC-seq data were analyzed using ArchR (version 1.0.1) based on barcoded cell type identities from scRNA-seq. 52 For each cell type, pseudo-bulk ATAC replicates were created using the addGroupCoverages function with default parameters, which generated between two to five replicates depending on how many cells of that type were present in each sample. Chromatin accessibility peaks on chromosomes 1-22 and X and outside of blacklist regions were then called using the addReproduciblePeakSet function and MACS2, 55,56 with scATAC peaks for each cell type defined as those present in at least two pseudo-bulk ATAC replicates (Data S2). Marker peaks were identified using the getMarkerFeatures function with a log2 fold change R1 and false discovery rate % 0.01 as determined by Wilcoxon pairwise comparisons (Data S3). Promoter peaks were defined as scATAC peaks within 2,000 bp upstream or 100 bp downstream of a TSS, and peaks co-accessible to promoter peaks were identified using the getCoAccessibility function with a correlation cutoff of 0.3 and resolution of 1. Predicted target genes for each scATAC peak were generated using the getPeak2GeneLinks function integrating barcode-matched RNA expression data from scRNA-seq with a correlation cutoff of 0.3 and resolution of 1. Nearest genes were determined using the BEDTools closest function based on gene annotations from TxDb.Hsapiens.UCSC.hg38.knownGene. 57 Bulk ATAC-seq data analysis Bulk ATAC-seq analysis was performed on published ATAC-seq data from five healthy human retinas. 20 After adapter trimming, fastq files were mapped to the hg38 genome using Bowtie2 and filtered to remove PCR duplicates and retain reads from only chromosomes 1-22 and X. 58 Peak calling was then conducted individually on each sample using MACS2, 55 followed by exclusion of peaks in blacklist regions. 56 Peak calls present in at least two of the five retinas were included in the bulk ATAC-seq peak set.

Sequencing tracks
Sequencing tracks of chromatin accessibility were generated in ArchR using the plotBrowserTrack function and were normalized by the total number of reads in TSS regions. 52 In some cases, bigWig files generated in ArchR using the getGroupBW function were visualized on the WashU Epigenome Browser. 59 All data were aligned and annotated to the hg38 reference genome unless otherwise stated.
Motif enrichment analysis TF motif enrichment analysis was performed on scATAC peaks using the peakAnnoEnrichment function in ArchR with default parameters based on position frequency matrices from Cis-BP (Data S4). 52,60 Footprinting analysis of TFs was conducted using the get-Footprints function in ArchR. 52 To correct for Tn5 insertion bias, the Tn5 insertion signal was subtracted from footprinting signals prior to plotting.

SNP selection and LD expansion
Index SNPs implicated in AMD, glaucoma, DR, myopia, or MacTel and located on chromosomes 1-22 and X were collected from the NHGRI-EBI GWAS Catalog, a curated collection of human GWAS. 25 LD expansion was then performed using LDlinkR to add any SNPs in LD with each index SNP, 61 defined as a LD R 2 value >0.9 in the phase 3 genotypes of the 1000 Genomes Project. 26 LD expanded SNPs were filtered to exclude variants in coding regions based on annotations in dbSNP to obtain the final set of noncoding SNPs (Data S6). 62 A list of all GWAS used in this study is provided in Table S3.
HiChIP data analysis HiChIP sequencing files were initially processed using the HiC-Pro pipeline (version 2.11.0) to remove duplicate reads, assign reads to MboI restriction fragments, filter for valid interactions, and generate binned interaction matrices. 63 Filtered read pairs from HiC-Pro were subsequently converted into .hic files and inputted into HiCCUPS from the Juicer pipeline to call loops (Data S7). 64 HiChIP interaction maps depicting all valid interactions identified by HiC-Pro were visualized using Juicebox. 65 Cell Genomics 2, 100164, August 10, 2022 e3 Resource ll OPEN ACCESS eQTL analysis Retina eQTL data were obtained from the Eye Genotype Expression (EyeGEx) database. 15 Each of the 1,152 SNPs overlapping with a scATAC peak was searched in the database and the nominal p value of any gene associations with that SNP noted. Adjusted p values were calculated by multiplying the nominal p value by the number of SNP-gene pairs tested for that SNP. Interactions with an adjusted p value < 0.05 were considered significant.
Deep learning model training scATAC-seq reads from the Cell Ranger ARC pipeline were aggregated by cell type to generate cell type-specific fragments files. The fragments files were converted to BigWig tracks of base-resolution Tn5 insertion sites with an +4/-4 shift to account for Tn5 shift. For each cell type, in addition to the peak regions, we selected an equal number of non-peak regions that were matched for GC content in their peaks. We then trained cell type-specific BPNet models to predict the log counts and base-resolution Tn5 insertion profiles as previously reported. 8,12 Briefly, the BPNet model takes as input a 2,114 bp one-hot encoded input sequence and predicts the ATACseq profile and log counts in a 1,000 bp window centered at the input sequence. Following BPNet formulation, we used a multinomial negative log likelihood (MNLL) for the profile output of the model and a mean square error (MSE) loss for the log counts output of the model. The relative loss weight used for the counts loss was 0.1 times the mean total counts per region. During each epoch, training examples were jittered by up to 500 bp on either side and a random half of the sequences were reverse complemented. Each batch contained a 10:1 ratio of peaks to non-peak regions. Five models were trained for each cell type corresponding to five disjoint training folds. Model training was performed using Keras/Tensorflow 2.

SNP scoring with BPNet
To score LD expanded SNPs associated with eye disease, we centered the input window at the SNP and obtained the log2 fold change in predicted counts between the reference and alternate alleles for each cell type-specific model. We averaged the log2 fold change over the five model folds for each SNP and cell type. To obtain p values, we performed one-sided Poisson tests of the predicted alternate allele count with the rate parameter set to the predicted reference allele count (counts averaged over five folds). For each SNP, we combined p values across cell types with Fisher's method and performed Benjamini-Hochberg correction. SNPs with an absolute fold-averaged log2 fold change >0.5 and false discovery rate <0.01 were assigned putative ''high effect'' annotation. To obtain a background set, random noncoding SNPs were chosen by shuffling a list of all SNPs from the 1000 Genomes Project, 26 filtering out coding regions, and selecting the first 10,000 entries. Only random SNPs localized to chromosomes 1-22 and X were then retained, leaving 9,984 background SNPs. Background SNPs had similar GC content as disease-associated LD expanded SNPs (51% versus 52%) and were scored as described above. Base importance tracks were visualized using Logomaker. 66 e4 Cell Genomics 2, 100164, August 10, 2022