Annotation of susceptibility SNPs associated with atrial fibrillation

Objective: Genome-wide association studies (GWAS) and the candidate gene based association studies have identified a panel of variants associated with atrial fibrillation (AF), however, most of the identified single nucleotide polymorphisms (SNPs) were found located within intergenic or intronic genomic regions, and whether the positive SNPs have a real biological function is unknown, and the real disease causing gene need to be studied. Results: The current results of the genetic studies including common variants identified by GWAS (338 index SNPs) and candidate gene based association studies (40 SNPs) were summarized. Conclusion: Our study suggests the relationship between genetic variants and possible targeted genes, and provides insight into potential genetic pathways underlying AF incidence and development. The results may provide an encyclopedia of AF susceptibility SNPs and shed light on the functional mechanisms of AF variants identified through genetic studies. Methods: We summarized AF susceptibility SNPs identified by GWAS and candidate gene based association studies, and give a comprehensive functional annotation of all these AF susceptibility loci. by genomic annotation, microRNA binding prediction, promoter activity analysis, enhancer activity analysis, transcription factors binding activity prediction, expression quantitative trait loci (eQTL) analysis, long-range transcriptional regulatory function analysis, gene ontology and pathway enrichment analysis.

AGING AF has shown by the identification of AF-causing mutations or rare variants in some families with lone AF, which occurs in structurally normal hearts and without known risk factors [6][7][8]. Meanwhile, in general AF, the non-hypothesis-driven genome-wide association studies (GWAS) and the candidate gene based association studies have identified a panel of common variants confer risk to AF [9][10][11]. These studies have set up a key role for the genetic background in generating for AF.
GWAS investigate associations between genomic variants and a disease or trait at the whole genome level without priori assumptions of genomic locations or potential functions of candidate genes. In this case, most of the identified single nucleotide polymorphisms (SNPs) associated with disease were found located within intergenic or intronic genomic regions, and whether the positive SNPs have a real biological function is unknown, and the real target gene need to be further studied [12,13]. For example, SNP rs2200733 on chromosome 4p25 is the first risk variant for AF identified by GWAS and is the most robustly replicated AF locus to date. The gene that closest proximity to rs2200733 and other AF susceptibility variants in 4q25 is the PITX2. Studies in mice showed that pitx2 haploinsufficiency promotes an atrial arrhythmia [14]. However, functional evidence about the mechanisms linking these non-coding variants with PITX2 or the incidence of AF is limited, until a recent study found that these non-coding variants in 4q25 possessing a long-range enhancer-promoter interactions and exert as a transcriptional regulatory directed function at PITX2 [15]. Understanding the biological nature of non-coding variants associated with AF can enable us to point the real causal genes causing AF and provide insight into the mechanism of AF.
Considering one of the most important challenges of AF genetic study is to elucidate functional mechanisms that how the susceptibility loci modulate AF risk, in the current study, we summarized the results of the studies including variants identified by GWAS and candidate gene based association studies, and give a comprehensive functional annotation of all these AF susceptibility loci. The non-synonymous SNPs were first identified and classified as functional SNPs, and for SNPs in the non-coding region, we try to predict their potential functions including microRNA binding, promoter activity, enhancer activity, transcription factors binding activity, expression quantitative trait loci (eQTL), and long-range transcriptional regulatory function. Our results may provide an encyclopedia of AF susceptibility SNPs and shed light on the functional mechanisms of AF variants identified through genetic studies.

AF susceptibility loci
Through searching the public databases including GWAS catalog (https://www.ebi.ac.uk/gwas/), GWAS central (https://www.gwascentral.org), and literatures in Pubmed, Embase and Medline, we included 18 AF GWAS and exome-wide association study (EWAS) in our study, which published from 2007 to 2019 (Table 1). The workflow of the current study is shown in Figure 1. Participants of these studies were mainly European ancestry (15 of 18 studies), and the rest were East Asian (Korean ancestry and Japanese) ( Table 1). A total of 338 SNPs (refer as index SNPs) passed the multiple corrections (P<5×10 -8 or corresponding multiple correction threshold) and showed a significant association with AF in GWAS and EWAS (Figure 2 and Supplementary Table 1). We also include 40 common SNPs which showed significant associated with AF in case control or population prospective study in candidate gene based analysis, or replication study of GWAS loci (Figure 2 and  Supplementary Table 1). Totally, we included 378 AF susceptibility SNPs in our further functional annotation.

Functional annotation of AF susceptibility SNPs in non-coding regions
According to the data of the chromatin state and modification of histone binding, a total of 250 SNPs in non-coding regions were identified as located in enhancer regions or might affect the histone mark of promoters and enhancers (Supplementary Table 1), and further analysis found that 65 of them may change the situation of interaction with transcription factors (Supplementary Table 1). 40 transcription factors were found interact with these SNPs. After corrected by genome-wide expected binding ability, these SNPs were significantly enriched for disruption of 3 TFs including STAT6 (P=0.02), REST (P=0.05) and NFIC (P=3.86x10 -3 ) (Figure 4).

eQTL analyses
SNPs in the non-coding region may associate with the expression levels and act as eQTL. We assessed the data from GTEx database (https://gtexportal.org/home/) and evaluate whether AF susceptibility SNPs affect the target gene expression levels. The results showed that 151 SNPs can affect the expression levels of a total of 328 target genes, and 81 of them associated with the expression levels of the closest gene (Supplementary Table 1). Combined with the TF binding data, 39 eQTL effect SNPs were found may alter the binding with transcription factors (Supplementary Table 1).

Long-range transcriptional regulatory function predictions
We used 3dSNP database (http://cbportal.org/3dsnp/) to analyze whether AF susceptibility SNPs affect distal target genes through topological interactions and function as long-range transcriptional regulatory elements. Results indicated that a total of 211 SNPs interact with distal target genes, and 104 of them exert as an eQTL effect (Supplementary Table 1).

Gene ontology and pathway enrichment analyses of eQTL targeted genes
eQTL targeted genes of AF were mapped onto Gene ontology (GO) database using three primary categories including molecular function, protein class and biological process via PANTHER (http://www. pantherdb.org). The results showed that AF related genes were mainly enriched in binding, cellular process, metabolic process, protein modifying enzyme, genespecific transcriptional regulator and membrane traffic protein ( Figure 5).
All eQTL targeted genes of AF were subjected to pathway enrichment analysis in the Search Tool for the Retrieval of Interacting Genes (STRING, v11.0, http://string-db.org). Statistical enrichment tests were executed on gene lists within the STRING by Gene Ontology and pathway annotations. The results uncovered some signaling pathway may play roles in AF including "organelle organization", "striated muscle cell development", "nuclear migration", "endomembrane system organization" and "striated muscle cell differentiation" ( Table 2).

DISCUSSION
Population-based genetic analysis including GWAS and candidate gene based analysis has identified several SNPs associated with the risk of atrial fibrillation, here, we summarized the current results of the common variants conferred risk to AF and totally including 378 SNPs. Considering most of the AF susceptibility SNPs were located in the non-coding genomic regions, we give a comprehensive functional annotation of all these AF susceptibility SNPs through microRNA binding prediction, promoter and enhancer activity prediction,  transcription factors binding activity prediction, eQTL analysis, and long-range transcriptional regulatory function predictions.
Our functional annotation found that 151 AF susceptibility SNPs showed an eQTL effect, and 238 SNPs in non-coding regions were identified as located in enhancer regions or might affect the histone mark of promoters and enhancers, Previous studies also showed that 50-60% of the traits associated non-coding variants identified by GWAS were found located in DNase I hypersensitivity regions [71,72], and these results also suggest that most of the SNPs identified by the GWAS as predisposing to atrial fibrillation may have biological functions and exert regulatory effects. Our results also showed that only 81 of the 151 eQTL SNPs associated with the expression levels of the closest genes, and a total of 328 target genes were identified affected by AF susceptibility SNPs. Our results identify novel genes that may be associated with the occurrence or development of AF. For example, rs35006907 located in 139bp upstream of a non-coding RNA gene LINC00964, was found associated with the expression level of MTSS1 gene (P=2.02×10 -18 ) in the left ventricle, which 119 kb downstream of rs35006907. Rs35006907 was predicted within an enhancer in several types of tissues including the right ventricle and right atrium, and long-range transcriptional regulatory function predictions also showed that rs35006907 and its located enhancer can interact with MTSS1 through long-range 3D chromatin loops. MTSS1 can promote actin assembly at intercellular junctions and a recent functional study indicated that rs35006907 showed a cardioprotective effect [73].
Another interesting finding is about AF susceptibility loci in 10q22, which was reported as the first genetic locus for familial atrial fibrillation by Brugada R et al. in 1997, and SNPs including rs10824026 [28,44], rs7394190 [21], rs6480708 [17] and rs60212594 [17] in 10q22 and upstream of SYNPO2L gene were found robustly associated with AF in several GWAS project. What is more, a missense variant in SYNPO2L, rs3812629 (p.Pro707Leu) was found to confer risk to AF in the Framingham population by Whole Exome Sequencing in Atrial Fibrillation [25] ( Figure 6A). However, our eQTL analysis using GTEx data showed that all these GWAS positive AF SNPs including rs10824026, rs7394190, rs6480708, and rs60212594 were strong associated with the expression level of MYOZ1 in human atrial appendage tissue with a P value from 1.3x10 -28 to 1.4x10 -45 ( Figure  6B). Furthermore, the missense variant in SYNPO2L, rs3812629 (p.Pro707Leu), which confer risk to AF, was also found associated with MYOZ1 expression level in human atrial appendage tissue, and the median normalized expression level of MYOZ1 in homozygous risk allele GG carriers was -0.28 and extremely lower than in heterozygous GA carriers (0.94) ( Figure 6B). MYOZ1 encode myozenin 1, which is an intracellular binding protein involved in linking Z-disk proteins, and was known as a calcineurin-interacting protein, and help tether AGING calcineurin to the sarcomere of skeletal and cardiac muscle [74][75][76]. Mutations in MYOZ1 were found in the patient with dilated cardiomyopathy [77][78]. These results suggested that MYOZ1, but not SYNPO2L is the causal gene of AF.
Previously genetic studies in familial or sporadic AF have identified numerous mutations or rare variants that putatively cause AF [5,[79][80][81][82][83], and to recently, 44 genes that putatively cause AF were mapped to pathway of ion channels/ion channels related (ABCC9, HCN4,  KCNA5, KCNE1, KCND3, JPH2, KCNE2, KCNE3 were found have bath rare variants and common variants related with AF. These may result from most mutation screening were carried out in familial AF, early-onset AF or lone AF, and AF patients in GWAS were more complex. In conclusion, we summarized the current results of the genetic studies including common variants identified by GWAS (338 index SNPs) and candidate gene based association studies (40 SNPs), and performed a comprehensive annotation of all these AF susceptibility loci found by GWAS and candidate gene based association. We identified 4 AF susceptibility SNPs in UTRs may change the microRNA binding ability, and 250 AF susceptibility SNPs in non-coding regions were identified as located in enhancer regions or might affect the histone mark of promoters and enhancers, 65 SNPs may change the situation of interaction with transcription factors and totally 40 transcription factors were found interact with these SNPs. Our results also showed that 151 SNPs can affect the expression levels of a total of 328 target genes and 81 of them associated with the expression levels of the closest gene. Long-range transcriptional regulatory function predictions showed that 211 SNPs interact with distal target genes, and 104 of them exert as an eQTL effect. We also performed a GO and pathway enrichment of the AF eQTL genes. Taken together, our study suggested the relationship between genetic variants and possible targeted genes, and provides insight into potential genetic pathways underlying AF incidence and development.

Acquisition of AF susceptibility variants and search strategy
The workflow of the current study is presented in Figure 1.  (Figure 1), and the searching keywords of medical subject headings (MeSH) including "atrial fibrillation" combined with "polymorphism, polymorphisms, variant, variants, single nucleotide polymorphism, single nucleotide polymorphisms, SNP, SNPs". The results of literature searching were eligibility screened by two reviewers based on titles and abstracts. Studies published between 1 January 2007 and 1 November 2019 were included. Only case control association studies or cohort-based prospective studies were included. Functional researches, animal model studies or studies not performed in a population were excluded.
Information of AF GWAS index SNPs was extracted from the database of GWAS catalog, and the threshold of significant level for the association was set as P value lower than 5×10 -8 . For the SNPs from candidate gene based association studies, publications were reviewed by two reviewers independently and extracted the information about the variant(s) and the details of the population. Discrepancies in data extraction were resolved by discussion or submitted to a third reviewer if required. We divided the variants analyzed in candidate gene based studies into three groups, (i) replication study of the GWAS identified susceptibility loci of AF, (ii) novel variants with minor allele frequency (MAF) ≥0.1% (according to 1000 genome phase III global data), (iii) rare variants with a low frequency (MAF<0.1%) associated with AF by candidate gene association study or mutation screening. In our study, we excluded rare variants and mutations in (iii) from our further annotations, for the causal genes harbored mutations or rare variants of AF which were found in families or cohort were well summarized in previously reviews [8,84] The significant level for SNPs in candidate gene based association studies was set as satisfying the Bonferroni correction. To reduce the probability of false positives, we exclude case controls studies if the statistical power <70%. The power was extracted from publications or calculated by PS: Power and Simple Size Calculation software [85].

Genomic region annotations
All AF susceptibility SNPs including index SNPs identified by GWAS and SNPs identified by candidate gene based association studies were first annotated using Variant Effect Predictor (http://asia.ensembl.org/ Homo_sapiens/Tools/VEP, GRCh38) in Ensembl to obtain their genomic region information.

Enhancer prediction and transcription factor (TF) binding sites prediction of AF susceptibility SNPs in non-coding regions
Splicing variants identified by Variant Effect Predictor were classified as functional SNPs directly. Next, for intronic or intergenic SNPs, the chromatin states data from the Roadmap and ENCODE to analyze whether they are overlapping any enhancers in possible AF related tissues and cell types.
For the AF susceptibility SNPs in the non-coding genomic regions, including in UTR, promoter, intron, and intergenic regions, SNP2TFBS database (http://ccg.vital-it.ch/snp2tfbs/) was used to predict potential binding ability between SNPs and transcription factors [88].

Histone modification analysis
SNPs in promoter, intron, and intergenic regions may modify the histone binding ability, and here, using HaploReg (version 4.1) (https://pubs.broadinstitute.org/ mammals/haploreg/haploreg.php) [89], we analyzed whether the identified non-coding AF associated SNPs overlap the major histone modifications (H3K9ac and AGING H3K4me3 for promoter regions, H3K27ac and H3K4me1for enhancer regions) in AF related tissues and cell types.

Long-range transcriptional regulatory function predictions
SNPs in the noncoding region may reside within or near regulatory elements controlling the expression of distal target genes through topological interactions, and using 3DSNP [90], we annotated the possible regulatory effect of identified AF associated SNPs by examining their 3D interactions with distal genes mediated by chromatin loops.

Expression quantitative trait loci analyses
Genotype-Tissue Expression (GTEx) data were used in determining whether identified AF associated SNPs affect gene expression levels. eQTL analysis were performed bases on raw RNA-Seq data (RPKM) by genes from the GTEx V6 analysis freeze (dbGaP Accession phs000424.v6.p1) and included 25 tissues.

Gene ontology (GO) and pathway enrichment analysis of eQTL targeted genes
Gene Ontology including biological process, molecular function, and protein class were annotated using PANTHER (http://www.pantherdb.org). KEGG pathway enrichment analysis were used in the Search Tool for the Retrieval of Interacting Genes (STRING, v11.0, http://string-db.org).