Regulatory variation within 3’UTR of STAT5A correlates with sudden cardiac death in Chinese populations

Abstract Definitive diagnosis to sudden cardiac death (SCD) is often challenging since the postmortem examination on SCD victims could hardly demonstrate an adequate cause of death. It is therefore important to uncover the inherited risk component to SCD. Signal transducer and activators of transcription 5 A (STAT5A) is a member of the STAT family and a transcription factor that is activated by many cell ligands and associated with various cardiovascular processes. In this study, we performed a systematic variant screening on the STAT5A to filter potential functional genetic variations. Based on the screening results, an insertion/deletion polymorphism (rs3833144) in 3’UTR of STAT5A was selected as the candidate variant. A total of 159 SCD cases and 668 SCD matched healthy controls was enrolled to perform a case-control study and evaluate the association between rs3833144 and SCD susceptibility in Chinese populations. Logistic regression analysis showed that the deletion allele of rs3833144 had significantly increased the SCD risk (odds ratio (OR) = 1.54; 95% confidence interval (CI) = 1.18–2.01; P = 0.000955). Further genotype-expression eQTL analysis showed that samples with deletion allele appeared to lower expression of STAT5A, and in silico prediction suggested the local 3 D structure changes of STAT5A mRNA caused by the variant. On the other hand, the bioinformatic analysis presented that promoters of RARA and PTGES3L-AARSD1 could interact with rs3833144, and eQTL analysis showed the higher expression of both genes in samples with deletion allele. Dual-luciferase activity assays also suggested the significant regulatory role of rs3833144 in gene transcription. Our current data thus suggested a possible involvement of rs3833144 to SCD predisposition in Chinese populations and rs3833144 with potential function roles may become a candidate marker for SCD diagnosis and prevention.


Introduction
Sudden cardiac death (SCD) refers to a sudden and unexpected death or arrest attributed to a cardiovas cular cause, mostly occurring within 1 h from the onset of symptom or within 24 h of last been observed to be alive if not witnessed [1]. As a leading cause of mortality, SCD accounts for half of all car diovascular deaths with an annual incidence of 57.3-110.8 per 100 000 in USA and 40.7 per 100 000 in China [2][3][4]. As work continues to pathological research, coronary heart diseases have been identified as the major etiology for SCD in older people [5], and inheritable arrhythmogenic diseases were chara cterized as an important cause in young [6,7].
Nevertheless, a definitive diagnosis to SCD is still challenging since postmortem examination could hardly demonstrate an accurate cause of death [8].
To uncover the molecular mechanisms of SCD and identify genetic markers, numerous efforts have been put into molecular genetics of SCD during the last few decades [9,10]. For instance, recent study has identified a causative variant within caveolin3 (Cav3) gene in a patient with SCD [11]. Population studies also suggested a familial aggregation of SCD, characterizing the SCD family history as an impor tant risk factor for sudden death [12][13][14]. The casual rare variants with large effect size to SCD may show incomplete penetrance, while common pathogenic variants with small effect size may also generate the clinical picture of SCD. Therefore, it is noteworthy to aggregate common pathogenic variants to establish polygenic risk score analyses for SCD.
Signal transducer and activators of transcription (STATs) are a family of cytoplasmic transcription factors, activated by Janus kinases (JAKs) through tyrosine phosphorylation and translocated into nuclear after dimerization [15,16]. These activated STAT proteins regulate expression of target genes, implicated in reninangiotensin system (RAS) [17], hypertrophy [18], angiogenesis [19,20], fibrosis [21,22] and other cellular processes in the cardiac myo cytes. STAT5A is one member of the family and one of two highly homologous but nonredundant iso forms (STAT5A and STAT5B) [23]. Previous studies [24,25] have identified that STAT5A of JAK/STAT pathway would be selectively activated in ischemia/ reperfusion (I/R) injury and postinfarction remode ling. Moreover, a recent study determined that STAT5A was a target of miR222, which could ham per neovascularization and suppress atherosclerotic disease progression [20]. Considering the functional role of STAT5A in various cardiovascular processes, it merits deciphering the variants on STAT5A gene to explore the potential effect of STAT5A on cardiac lethal events. The 3'untranslated region (3'UTR) is the noncoding part of mRNA, harbour various func tional elements and play an important role in gene regulation [26]. We therefore screened polymor phisms within this region of STAT5A and performed a casecontrol study, finally identifying an insertion/ deletion (indel) variation within STAT5A 3'UTR cor related with SCD susceptibility in Chinese popula tions. Further functional experiments were implemented to investigate underlying mechanisms.

Study populations
A total of 159 SCD cases and 668 healthy controls were enrolled in this casecontrol study. All the enrolled subjects were genetically unrelated Han Chinese. The 159 SCD patients were recruited from 2012 to 2020 at Sun Yatsen University, Soochow University, Academy of Forensic Science, China and Xiangya Medical University. The recruitment criteria for both cases and controls were the same as those described previously [27,28]. Blood samples from SCD patients were assessed by exhaustive toxicological examinations to rule out toxic death. Comprehensive forensic pathological examinations were also con ducted in all the cases and different extents of coro nary atherosclerosis were detected in the corpses. Except these lesions, no other mortal pathological changes were observed during the autopsy. These deceased were thus assumed to suffer sudden death derived from coronary heart diseases. The 668 healthy controls were selected through community nutritional questionnaires performed in the same regions and during the same periods as the victims. They were all frequency matched to SCD victims based on age (±5 years) and gender and identified without any car diovascular disease history or SCD family history. This study was approved by the Ethical Committee of Soochow University. All participants or their relatives provided written informed consents.

Genotyping of STAT5A Indel polymorphism
TIANamp Blood DNA Kit (TIANGEN, Beijing, China) was employed to extract genomic DNA from peripheral blood samples. The primers used for rs3833144 poly morphism were 5′TGAAGCGGTCGTGTTGTGA3′ (Forward), 5′CACATCCCAGGACTGCACA3′ (Reverse), which were generated by Genewiz Company (Suzhou, China). Genotyping analysis of PCR products amplified using above primers was completed with 7% nondenaturing polyacrylamide gel electrophoresis and silver staining [29]. For quality control, 50 random DNA samples were used for validation of genotyping method by means of direct sequencing. Meanwhile, 10% masked random samples were analyzed twice by independent technicians to confirm 100% reproducibility.

Bioinformatic analysis
Mutations within STAT5A 3'UTR were collected from dbSNP database [30]. The genotyping and mRNA expression data available online were employed for genotypeexpression analysis in our study. The RNA sequencing data of 445 lympho blastoid cell lines was available from 1000 Genomes Project (1KGP) database (https://www.ebi.ac.uk/ arrayexpress/experiments/EGEUV3/). The geno types of the variants observed in 2 504 samples in 1KGP database were directly obtained from Ensembl Genome Browser (http://asia.ensembl.org/Homo_ sapiens/Info/Index). Based on above data, expression quality trait loci (eQTL) analysis was performed in R(https://w w w.datavis.ca/R/) to evaluate genotypephenotype association. The eQTL and splicing quality trait loci (sQTL) analysis were also performed in GenotypeTissue expression (GTEx) database. Linkage disequilibrium (LD) analysis was performed using LD heatmap package in R. The putative influence of insertion and deletion alleles on local 3 D structure of STAT5A mRNA was pre dicted using SimRNAweb (https://genesilico.pl/ SimRNAweb/submit) based on a 170basepair region containing variant rs3833144. HaploReg [31] and ENCODE [32] database were used to perform functional annotations of the variant. The chromatin interaction partners for rs3833144 were extracted from highthroughput chromosome conformation capture (HiC) data in 3DIV database [33].

Construction of reporter plasmids
A total of 210bp or 206bp DNA fragments including the variant rs3833144 were directly generated by Genewiz Company. These fragments were directionally subcloned into KpnI and SacI sites of pGL3promoter vector, generating two wild type vectors containing insertion allele (pGL3proWTcis/trans) and two mutant type vectors containing deletion allele (pGL3proMTcis/trans). The sequence and direction of resultant constructs were testified by direct sequencing.

Cell culture and luciferase reporter assay
In vitro experiments were performed using 293 T cell lines purchased from Shanghai Cell Bank of Chinese Academy of Sciences (Shanghai, China). The cell lines were authenticated through short tan dem repeat markers and maintained in Dulbecco's modified eagle medium with 10% fetal bovine serum and 1% penicillinstreptomycin in a humidified atmosphere of 37 °C and 5% CO 2 .
Twentyfour hours after plated in the 24well plates, 293 T cells were transfected with allele and directiondifferent reporter plasmids (about 400 ng) with the corporation of jetPRIME ® transfection reagent (Polyplustransfection ® , Illkirch, France). The empty pGL3promoter vector transfected group was performed as negative control, and approximately 20 ng SV40 vector containing Renilla luciferase gene was cotransfected in each well for normali zation of lucife rase activity. Twentyfour hours after transfec tion, cells were harvested immediately by adding into 100 mL passive lysis buffer. Firefly luciferase activi ty was assessed by the Dual Luciferase ® assay system (Promega, Madison, WI, USA) in FilterMaxF5 (Molecular Devices, San Jose, CA, USA). Each group had four duplicates and every transfection experi ment was repeated at least three times.

Statistical analysis
The X 2 test was performed to measure HardyWeinberg equilibrium for the representative of control samples. Unconditional logistic regression with adjustment for sex and age was used to eva luate the correlations between rs3833144 and SCD risk by estimating odds ratios (ORs) and their 95% confidence intervals (CIs). The mRNA expression levels between samples with different genotypes were compared by Oneway ANOVA in eQTL ana lysis. Student's t test was performed to evaluate the difference of the luciferase activities. The sta tistical analyses were implemented by Statistic Analysis System software (version 8.0; SAS Institute, Cary, NC, USA), a Pvalue of less than 0.05 was considered as the criterion of statistical significance. All statistical tests were twosided in our study. The statistical power of the current sample size was calculated using the G*Power 3.1 software [34].

Bioinformatic screening of variants within 3'UTR of STAT5A gene
The workflow for this study was presented in Figure 1. All mutations in STAT5A 3'UTR with frequency available online were summarized in Supplementary  Table S1. Among these mutations, only four muta tions (rs3833144, rs3198502, rs115456777 and rs73983709) had a minimum allele frequency (MAF) of more than 0.01. Since rs115456777 was observed with only two genotypes, we excluded this variant to avoid potential deviation from HardyWeinberg equilibrium. Genotypephenotype eQTL analysis was subsequently performed between remaining three variations and STAT5A mRNA expression levels by using genotyping and mRNA expression data from 1000 G database. As shown in Supplementary Table S2, only rs3833144 polymor phism showed a significant genotypephenotype association with its host gene STAT5A (P = 0.037), compared with rs3198502 (P = 0.273) and rs73983709 (P = 0.096). Also, the expression levels in samples with ins/del and del/del genotypes were lower than that with ins/ins genotype, which was shown in Figure 2A. Furthermore, we observed that rs3198502 was an sQTL significant variant for STAT5A and an eQTL significant variant for STAT3 based on GTEx database ( Figure 2B and C), while rs72983709 was not correlated with STAT5A transcription. Finally, our LD analysis suggested that rs3833144 and rs3198502 were in one LD block ( Figure 2D). Unlike rs3198502, indel polymorphism rs3833144 contained the characteristic of length polymor phism, which was compatible with capillary elec trophoresis (CE) platform commonly applied in forensic practice. We therefore chose the indel poly morphism rs3833144 to identify its association with risk to SCD based on casecontrol study.

Correlations between STAT5A rs3833144 polymorphism and SCD susceptibility
The demographic characteristics of both SCD subjects and their matching controls in present study were    Table 1. The characteristics of the SCD cases are listed in Supplementary Table S3. The median age of the SCD patients was 50 years old. Among the case group, there was a remarkable discrepancy in the amount of two genders, at a male/female ratio of 9.6:1, suggesting male as a significant risk factor for SCD. More than half (58%) of the death occurred without previous symptom, and 70 cases (44%) took place without heavy physical or emotional stress, indicating that SCD was an unexpected and unpredictable dis ease. Examples of genotyping and sequencing output were displayed in Figure 3. The genotype frequencies of STAT5A rs3833144 observed in the controls were in correspondence with HardyWeinberg equilibrium. The G*Power 3.1 software was employed, examining a statistical power of 0.712 with α set at 0.05 under the dominant model.
Genotypic frequencies of rs3833144 along with OR and its 95%CI in both cases and controls were pre sented in Table 2 ). As a result, these findings indicated that deletion allele of rs3833144 would contribute to a higher susceptibility to SCD.

rs3833144 alters local 3D structure of STAT5A mRNA
Although rs3833144 locates in 3'UTR of STAT5A gene harbouring abundant miRNA binding sites, it was dif ficult to predict a matched miRNA whose binding could be interrupted by this "GTGT" indel variant, since "GT" repeats significantly abound in this locus.
RNA structure plays important roles in gene expression regulation and biological function. To explore mechanisms underlying the genotype expression association between rs3833144 and STAT5A presented by our eQTL analysis, we used SimRNAweb to predict local 3 D structure of STAT5A mRNA. As presented in Figure 4, the insertion of "GTGT" or not contributed to a   visua lized discrepancy in mRNA local structures. The altered local structure within 3'UTR might influence mRNA expression of STAT5A at a posttranscriptional level.

Functional annotation and bioinformatic analysis of rs3833144
To further interrogate the potential regulatory roles of variants rs3833144, we next conducted in silico functional annotation based on Haploreg and ENCODE database. As summarized in Supplementary  Table S4, among most of cell lines and tissues avail able in Haploreg database, the rs3833144 polymor phism located in a region with enrichment of H3K4me1 and H3K27ac and depletion of H3K4me3. Similar modification profiles can also be found in ENCODE database, which were shown in Figure  5A. These results all indicated that rs3833144 locus may reside in a longrange regulatory element. We therefore speculated that some nearby genes may be interacting with and regulated by this regulatory element, and examined all genes located within 2 Mb for interactions based on HiC data from 3DIV database. Two tissues, left and right ventricle, were chosen as model systems and distance normalized interaction frequency >2.00 was considered as the criterion of biologically significant interactions. The significant chromatin interaction partners (genes) for rs3833144 locus were listed in Supplementary  Table S5 and S6. A total of 47 (left ventricle) and 43 (right ventricle) genes were identified to possibly interact with rs3833144 locus, and 26 genes were found in both two tissues.
To further investigate whether the transcription of interaction genes would be influenced by rs3833144 through an alleledependent manner, we performed genotypeexpression eQTL analysis between rs3833144 and the interaction genes. Only two genes, retinoic acid receptor alpha (RARA) and PTGES3LAARSD1 readthrough (PTGES3L-AARSD1) were identified to possess genotypephenotype correlations with rs3833144. Intriguingly, the chromatin interactions of the two genes with rs3833144 were not found in either of the two tissues ( Figure 5B). As shown in Figure 5B and C, the mRNA level of RARA (P = 0.029) and PTGES3L-AARSD1 (P = 0.014) both appeared to an apparent difference between samples with different geno types, manifesting an increasing trend in samples with ins/del and del/del genotypes compared with ins/ins genotype. These findings revealed the possi bility that the rs3833144 locus might interact with the promoters of RARA and PTGES3L-AARSD1 and thereby regulate the gene transcription through a longrange regulation mechanism.

The regulatory role of rs3833144 on gene transcription activity
Based on HiC data and eQTL analysis, we have observed the chromatin interactions and genotypephenotype correlations between rs3833144 and two interaction genes. To validate the effects of the indel variant on transcription activity, we next performed luciferase reporter assays using pGL3promoter vector with allele and directiondifferent fragments harbouring rs3833144 ( Figure 6A). As shown in Figure 6B, the luciferase activities appeared to be significantly different between the groups trans fected with vectors harbouring insertion allele (pGL3proWT) and the groups harbouring deletion allele (pGL3proMT), regardless of 5′ or 3′ direction of the fragments (P<0.01). Moreover, the constructs harbouring trans (3′→5′) fragments had significant higher luciferase expression than constructs harboring cis (5′→3′) fragments (P<0.01, P<0.001). They both had significant higher luciferase expression than fragments transfected with pGL3prometer (P<0.001). These results all demonstrated the regulatory properties of rs3833144 on gene transcription.

Discussion
In this study, a novel indel polymorphism has been identified to associate with risk of SCD through bioinformatic screening and subsequent casecontrol study. On the basis of our bioinformatic and func tional analysis, we discovered that rs3833144 played a functional role probably through the following two manners: firstly, rs3833144 might affect 3 D structure of STAT5A mRNA and thereby influence its expres sion through a structuredependent manner; se condly, rs3833144 could interact with promoters of nearby genes and regulate gene transcription activities via a longrange mechanism. These results uncovered the possible biological mechanisms underlying the correlation between rs3833144 and SCD risk. Thus, the indel polymorphism may become a potential marker for forensic molecular diagnosis and genetic counseling of SCD.
As a signal transducer and transcriptional factor (TF), the role of STAT5A in the progression of car diovascular diseases remains contradictory. On one hand, STAT5A would be activated by JAK2 during ischemia/reperfusion (I/R) injury and bind to the spe cific domain located in promoter of angiotensinogen gene, contributing to activation of cardiac renin angio tensin system (RAS) [24]. Moreover, depression of STAT5A attributed to miR222 in endothelial cells would protect against advanced neovascularized athe rosclerotic lesions [20]. These results suggested that STAT5A would play a pathological role in atheroscle rosis and myocardial ischemia injury. On the other hand, there was also some evidence indicating the cardioprotective role of STAT5A. For instance, STAT5A was reported to be indispensable for the cardiacpro tection of ischemia preconditioning (PC) against myo cardial I/R injury [35,36]. When treated with berberine, relaxin, a protein known for its antifibrotic effects would be endogenously upregulated because of reduced STAT3 and increased STAT5A binding to the relaxin promoter [22]. The discrepancy might be attributed to different upstream stimulations and downstream targets for STAT5A. Despite the contra dictory role of STAT5A in the heart function, there is no doubt that either upregu lation or repression of STAT5A would trigger some transient factors and cause malignant effects on cardiovascular system. In our present study, we have identified that the deletion allele of rs3833144 linked to higher SCD susceptibility, together with repression of STAT5A mRNA expression. Previous study have reported that genetic variants The luciferase activities were compared between construct groups containing different alleles in 293T lines, regardless of 5' or 3' direction of the fragments ( ### P < 0.001, compared with pGL3-proWT in cis group; ## P < 0.01, compared with pGL3-proWT in trans group). The luciferase activities of both insertion or deletion construct group were compared between construct groups including different directions of the fragments (**P < 0.01); cells transfected with pGL3-proWT-cis/trans or pGL3-proMT-cis/trans showed a significantly higher luciferase activity as compared with cells transfected with pGL3-promoter ( +++ P < 0.001).
could change RNA local structure, affect the mRNA stability and regulate gene expression [37,38], and our in silico ana lysis exhibited the altered 3 D struc tures of STAT5A mRNA caused by different alleles of the variant indeed. We therefore proposed a hypo thesis that the deletion allele of rs3833144 may impair the stability of STAT5A mRNA, which thereby con tributes to aberrant suppression of STAT5A expression, interfere cardioprotective role of STAT5A and finally facilitate the occurrence of SCD.
SCD is a complex disease caused by various fac tors, such as cardiac diseases, genetic factors, drastic emotional change or exercises. Consistently, STAT5A regulation may also be interfered by various regula tory factors or genetic variants. Our eQTL analysis based on 1000 G database have identified that rs3833144 is associated with STAT5A mRNA expres sion. Apart from this variant, the sQTL analysis per formed based on GTEx database also uncovered that rs3198502 may influence intron splicing of STAT5A, which may interfere STAT5A function. These bioin formatic results have suggested two possible regula tory factors for STAT5A regulation. Further, our LD analysis also shows that polymorphism rs3833144 and rs3198502 are in one LD block. Considering the con tribution of rs3833144 to SCD risk, we therefore speculated that rs3833144 and rs3198502 may play a synergetic role in STAT5A regulation as well as SCD occurrence.
Cisregulatory elements (CREs), such as promo ters, enhancers, silencers, and insulators, are noncoding DNA regions which regulate gene expression at a transcriptional level. Genomewide association studies have suggested a series of diseaseassociating noncoding mutants could dis rupt the CREs and influence expression of nearby genes via a longrange mechanism [39][40][41]. Our functional annotations found that the region con taining rs3833144 showed enrichment of H3K4me1 and H3K27ac and depletion of H3K4me3, which were all indicative markers of enhancers [42]. Further bioinformatic analysis identified that RARA and PTGES3L-AARSD1 interacted with rs3833144, and eQTL analysis manifested significant genotypeexpression correlations between rs3833144 and the two genes with a higher expression linked to deletion allele. The luciferase assays finally demonstrated the regulatory role of the variant on gene expression, suggesting the possibility that rs3833144 may reside in a potential CRE and control the transcription of RARA and PTGES3L-AARSD1 through a longrange mechanism.
RARA, a member of nuclear retinoic acid (RA) receptor family, acts as a transcription factor and control gene transcription through a liganddependent manner. It has been reported that cardiac impairment of RARA would lead to the development of diastolic dysfunction [43], while activated RARA would pro tect artery from atherosclerosis by derepressing miR10a in vascular endothelial cells in oscillatory shear regions [44]. Despite these cardioprotective role, overexpression of RARA could also stimulate differentiation of aortic endothelial cells and enhance RAinduced angiogenesis [45]. Considering that increased neovascularization within atherosclerotic plaques would enhance the risk of plaque rupture [46], RARA overexpression may play a pathological role in plaque rupture and finally increase SCD risk, especially to the victims with high extents of athe rosclerosis. PTGES3L-AARSD1 represents naturally readthrough transcription and encodes a fusion pro tein of two nearby genes: prostaglandin E synthase 3 like (PTGES3L) and alanyltRNA synthetase domain containing 1 (AARSD1). Although few studies have been reported about PTGES3LAARSD1, this kind of transcriptioninduced chimera may harbour the properties of both genes [47]. PTGES3L encodes a protein possessing similar region with prostaglandin E synthase 3 (PTGES3), a synthase of prostaglandin E2 (PGE2). It is previously reported that PGE2 could contribute to cardiac hypertrophy [48], atherogenesis [49] and inflammatory cardiomyopathies [50]. Meanwhile, both PTGES3 (also known as p23) and AARSD1 functions as cochaperones of heat shock protein 90 (HSP90), which participates in cardiac hypertrophy and heart failure [51]. Based on the evidence exemplified above, we could certainly specu late that the PTGES3LAARSD1 play a promi nent role in cardiovascular diseases and dysregulation of PTGES3LAARSD1 would be harmful to human heart. Taken together, it is plausible that the over expression of RARA and PTGES3L-AARSD1 derived from deletion allele of rs3833144 might cause malig nant changes to our heart and participate in the pathophysiology of SCD.
Some limitations should be emphasized in our study. Since the RNAseq and genotyping data of cardiomyocytes are not available at present, our eQTL analysis mainly focused on lymphoblastoid cell lines in 1KGP database. Therefore, based on limited bio informatic results we can only propose possible hypotheses about how the variant rs3833144 contrib ute to SCD risk, further functional assays are needed to verify our results. Additionally, the significance of the finding that the deletion allele of rs3833144 con tributed to the SCD risk are still limited by small sample size in our study. Further replicated casecontrol studies with different or expanded pop ulations are needed to guarantee our observations. Finally, since our SCD cases mostly suffer sudden death originated from coronary atherosclerosis, our results raised the possibility that rs3833144 may be associated with coronary atherosclerosis susceptibility in natural populations. However, this implication still needs to be further investigated by coronary athero sclerosis related studies, which may have potential interests for cardiovascular community.

Conclusion
Here, we have provided the initial evidence that the indel polymorphism (rs3833144) within the 3'UTR of STAT5A gene was significantly associated with SCD risk. Our current data thus suggested a possible involvement of rs3833144 to SCD predis position in Chinese populations and rs3833144 with potential function roles may become a can didate marker for SCD diagnosis and prevention.