Liver regulatory mechanisms of noncoding variants at lipid and metabolic trait loci

Summary Genome-wide association studies (GWASs) have identified hundreds of risk loci for liver disease and lipid-related metabolic traits, although identifying their target genes and molecular mechanisms remains challenging. We predicted target genes at GWAS signals by integrating them with molecular quantitative trait loci for liver gene expression (eQTL) and liver chromatin accessibility QTL (caQTL). We predicted specific regulatory caQTL variants at four GWAS signals located near EFHD1, LITAF, ZNF329, and GPR180. Using transcriptional reporter assays, we determined that caQTL variants rs13395911, rs11644920, rs34003091, and rs9556404 exhibit allelic differences in regulatory activity. We also performed a protein binding assay for rs13395911 and found that FOXA2 differentially interacts with the alleles of rs13395911. For variants rs13395911 and rs11644920 in putative enhancer regulatory elements, we used CRISPRi to demonstrate that repression of the enhancers altered the expression of the predicted target and/or nearby genes. Repression of the element at rs13395911 reduced the expression of EFHD1, and repression of the element at rs11644920 reduced the expression of LITAF, SNN, and TXNDC11. Finally, we showed that EFHD1 is a metabolically active gene in HepG2 cells. Together, these results provide key steps to connect genetic variants with cellular mechanisms and help elucidate the causes of liver disease.


Introduction
3][4][5] However, for the large majority of these GWAS signals, the underlying variants are mostly noncoding and lack functional data to explain how variants affect genes and how gene dysfunction leads to disease. 6Most GWAS signals are enriched in transcriptional regulatory elements 7 and may affect chromatin accessibility, transcription factor binding, and/or gene expression. 8Transcriptional reporter and nuclear protein binding assays can provide experimental evidence in support of allelic effects on regulatory activity.
Several bioinformatics-based analytical tools can predict candidate target genes for GWAS signals.One approach is to integrate GWAS signals with quantitative trait loci for gene expression (eQTLs) in trait-relevant tissues through colocalization analysis.][10] We previously integrated lipid-related metabolic trait GWAS signals with eQTL and caQTL in the liver to predict candidate variants, regulatory elements, and genes for lipid-related disorders. 11Experimentally testing putative effector genes is particularly difficult for GWAS signals that reside in regions distal to any promoter.For these signals, CRISPR interference (CRISPRi) can be used to functionally assess potential target genes.CRISPRi has been used traditionally to repress a gene by interfering with regulatory activity by the promoter. 12,13][16] Here, we aimed to identify regulatory mechanisms underlying lipid-related metabolic traits in hepatocytes at five GWAS signals.We identified genetic signals that were associated with plasma lipid or liver enzyme levels and that plausibly affected gene expression regulation as identified through colocalization with liver caQTL and eQTL signals. 11,17For selected caQTL variants, we evaluated allelic effects on regulatory activity by performing transcriptional assays.Next, we used CRISPRi to repress putative distal regulatory elements to link them with their target gene(s).Finally, we evaluated one gene, EFHD1, for its role in regulating metabolic activity in hepatocytes.Together, these experiments validate predicted regulatory mechanisms and provide a scalable strategy for future studies.

Prediction of candidate caQTL variants and genes
As described previously, 11 we identified liver caQTL that colocalized with at least one GWAS signal for total cholesterol, triglycerides, high-density lipoprotein-cholesterol, low-density lipoprotein-cholesterol (LDL-C), or the liver enzymes ALT, aspartate aminotransferase (AST), or g-glutamyl transferase (GGT) 5,18,19 (Table S1).We also identified candidate target genes for caQTL using proximity to promoter, colocalization with liver eQTL signals, and liver Hi-C data. 11,17,20Here, we used ENCODE histone chromatin immunoprecipitation sequencing (ChIP-seq) data for the epigenomic marks H3K4me3 and H3K27ac to identify active promoters (H3K4me3 þ H3K27ac) and enhancers (H3K27ac and distal to the transcription start sites), 21 obtained HepG2 chromatin states as ChromHMM from ENCODE, 22 and predicted transcription factors that bind at specific sites using liver ChIPseq data from ENCODE 3. 23 We defined linkage disequilibrium (LD) between caQTL, GWAS, and eQTL variants based on 1000 Genomes European individuals. 24

Cell culture
We obtained the HepG2 human hepatoblastoma cell line from the American Type Culture Collection (ATCC) and cultured cells in MEM-alpha (Gibco no.12561) supplemented with 10% fetal bovine serum (FBS) and 1 mM sodium pyruvate.We obtained SNU-761 cells from the Korean Cell Line Bank and cultured them in RPMI 1640 media (Corning no.10-040-CV) with 10% FBS.HEK293T cells from ATCC were cultured in DMEM (Sigma no.D6429) supplemented with 10% FBS and 200 mM glutamine.All of the cells were maintained at 37 C and 5% CO 2 .For metabolic switching experiments, HepG2 cells were grown in DMEM (Gibco no.A14430) and SNU-761 cells were grown in RPMI 1640 (Corning) without glucose and then supplemented with 5.5 or 11.1 mM glucose or galactose for 6 days before analyzing EFHD1 expression.

Transcriptional reporter assays
To test variants for allelic differences in transcriptional activity, we designed primers (Table S2) to span each caQTL variant and the surrounding accessible chromatin region, amplified products from human DNA, and cloned them into a firefly luciferase vector as described previously. 25,26We cloned putative enhancer sequences into pGL4.23 (Promega) in forward and reverse orientations with respect to the genome and we cloned putative promoter elements into promoter-less vector pGL4.10 in the orientation that matches the direction of gene transcription (Table S2).We cloned and analyzed four to five sequence-verified clones for each allele.We cotransfected HepG2 cells with a luciferase reporter and phRL-TK Renilla reporter vector (Promega) using Lipofectamine 3000 (Life Technologies) following the manufacturer's protocol and measured luciferase activity 48 h after transfection using the Dual-Luciferase Reporter Assay System (Promega).We normalized luciferase to Renilla activity, calculated fold change in luciferase activity relative to an empty vector (EV; pGL4.23 or pGL4.10), and tested for differences in activity using two-tailed Student's t tests.

Electrophoretic mobility shift assays (EMSAs)
We designed and annealed biotin-labeled and unlabeled 17-bp complementary oligonucleotides centered on rs13395911 (Table S2).We conducted EMSAs using the Light Shift Chemiluminescent EMSA kit (Thermo Fisher Scientific) following the manufacturer's protocol.The binding reactions consisted of 6 mg HepG2 nuclear extract (NE-PER Kit, Thermo Fisher Scientific), 20 ng/mL poly(dI-dC), 13 binding buffer, and 400 fmol biotinylated oligonucleotides.To test the specificity of the protein complexes to each allele, we added 15-fold excess competitive unlabeled probes (Table S2) in the binding reactions.For the supershift assays, we preincubated with 4 mg of antibodies to FOXA2 (Santa Cruz, no.sc6554x) or SP1 (Santa Cruz, no.sc-17824x) in the binding reactions before performing EMSAs, as described previously. 25

CRISPRi
After plating 120,000 HepG2 cells/well onto collagen-I (Corning)coated 12-well plates, we transduced with lenti-sgRNAs at 20 MOI, a concentration that was selected based on transduction efficiency and cell health, in the presence of 10 mg/mL polybrene media. 31,32fter 8 h, we replaced growth media with media containing doxycycline (2 mg/mL) and visualized GFP and mCherry expression by fluorescent microscopy.At 72 h or 7 days after transduction, we lysed cells, extracted total RNA (RNeasy Plus, Qiagen), and converted to cDNA (Superscript IV Vilo, Invitrogen).We assessed gene expression by qPCR using TaqMan probes (Thermo Fisher Scientific), using the 2 ÀDDCt method and normalizing to the B2M expression level.

EFHD1 knockdown and cell-based functional assays
We performed short hairpin RNA (shRNA)-based lentiviral-mediated gene knockdown of EFHD1 in HepG2 cells using four shRNA lentiviral plasmid constructs (TL304828, Origene Technologies).We used this system to test gene function because shRNA-based lentiviruses show stronger and more stable knockdown of genes. 33entivirus particles expressing shRNAs were prepared and titrated as described above.At 72 h after transduction, we assessed knockdown efficiency normalized to ACTB by qPCR and selected the two most efficient shRNAs for functional assays.Briefly, we transduced cells for 72 h at 2.5 and 5 MOI before replating them for functional assays.We performed ATP assays (Roche HS II) in 4-h serum-starved cells in media supplemented with 10 mM galactose 34 at a density of 15,000 cells/well of 96-well plates, and we assayed lipid accumulation (Promega) at a density of 80,000 cells/ well in a 96-well plate.After starving cells of serum overnight, we measured glucose uptake in cells at a density of 60,000 cells/ well in a 24-well plate; we then starved the cells of glucose for 1.5 h, preincubated with human insulin (Sigma) in Krebs Ringer Buffer 35 for 30 min, 36,37 and then we measured glucose uptake (Promega).

Results
Selection of candidate regulatory variants at GWAS signals Our previous study identified liver caQTL signals that colocalized with GWAS signals for lipid or liver metabolism and with liver eQTL signals. 11Because caQTL variants could be located up to 1 kb from the region of chromatin accessibility, 11 we focused on variants located within the accessible region and the genes identified by the colocalized eQTL. 171][42][43] The signals were associated with LDL-C, triglycerides, ALT, GGT, and/ or AST levels, 5,18,44 and the genes for the colocalized eQTL signals are ZNF329, GPR180, EFHD1, LITAF, and EPHA2 (Figure S2; Tables 1 and S1).
The signals correspond to genes that may influence metabolism.One GWAS signal for LDL-C colocalized with a liver caQTL signal and a liver eQTL for ZNF329 (Figures 1A-1C), 17,19 and the lead caQTL variant rs34003091 and proxy variant rs35081008 overlap a weak promoter chromatin state in HepG2 (Figure 1D) and liver ChIP-seq regions for 7 transcription factors (Figure 1E).ZNF329 is significantly downregulated in skeletal muscle after weight loss surgery in obese humans. 45A GWAS signal for triglycerides colocalized with an eQTL for GPR180, which has been reported to play a role in vascular modeling 46 and is required for the proper functioning of brown adipocytes. 43A GWAS signal for AST and ALT colocalized with an eQTL for EFHD1, which codes for a mitochondrial protein known to regulate metabolic activity in B cells and neuronal cells 41,42 A GWAS signal for LDL-C 5 (Table S1) colocalized with an eQTL for LITAF, which is a BCL6 target gene that has been shown to enhance autophagy in B cell lymphomas. 47At this signal, we previously demonstrated that rs11644920, which is an LD proxy (r 2 ¼ 0.98) of the lead caQTL variant rs57792815, exhibited significant allelic differences in transcriptional activity and allele-specific protein binding. 11Finally, a GWAS signal for ALT colocalized with an eQTL for EPHA2, which encodes a tyrosine kinase receptor implicated in several clinical and biological processes. 48,49At this signal, rs12057222, which is an LD proxy (r 2 ¼ 0.82) of the lead caQTL variant rs36086195, is not located in the chromatin accessibility region of the caQTL, but in a nearby accessible chromatin region.

Effects of caQTL variant alleles on transcriptional activity
To evaluate the putative functional caQTL variants for allelic effects on transcriptional activity, we performed transcriptional reporter assays in HepG2 cells.We tested regions around the caQTL lead signals near ZNF329, GPR180, EFHD1, and EPHA2 (Table S2).All four elements showed 2-to 150-fold higher transcriptional activity than an EV control, demonstrating that the elements are promoters or enhancers and not repressors (Figures 2 and  S3).The elements near GPR180 (p ¼ 0.008, forward; p ¼ 0.016, reverse) ZNF329 (p ¼ 0.02 reverse), and EFHD1 (p < 0.001 forward; p < 0.001 reverse) showed significant allelic differences in transcriptional activity (Figure 2).Most dramatically, at EFHD1, a 343-bp genomic segment containing the rs13395911-T allele showed a 3-fold increase in transcriptional activity compared to the rs13395911-A allele (p < 0.001).For all three signals, the alleles that showed more enhancer activity were the alleles associated with greater chromatin accessibility 11 and higher gene expression in the liver eQTL data 17 (Table S1).The element at EPHA2 showed relatively modest transcriptional activity and no allelic differences (Figure S3); we did not pursue this element for additional analyses.

Differential binding of nuclear proteins to rs13395911 alleles
Due to the strong allelic differences observed in the transcriptional activity of the EFHD1 element containing rs13395911, we further investigated whether transcription factors may bind differentially to the rs13395911 alleles.We performed EMSAs using HepG2 nuclear protein extract and 17-bp biotinylated probes centered around either rs13395911-A or rs13395911-T (Table S2).The rs13395911-T allele probe showed a band with stronger protein binding than the rs13395911-A allele probe (Figure 2D, gray arrow; lane 6 vs. lane 1).The addition of the excess unlabeled rs13395911-T probe competed away the band better than the excess unlabeled rs13395911-A probe (Figure 2D, lane 7 vs. lane 8), providing further evidence that the protein binds preferentially to the rs13395911-T allele.Because the genomic sequence surrounding rs13395911 has been shown to overlap transcription factor ChIP-seq data 21,23,50  we tested whether the band could correspond to one of these factors.Incubation of the DNA-protein complexes with antibodies to FOXA2 showed a supershift of the T-allele complex (black arrow; lane 9), providing evidence that FOXA2 can bind to rs13395911-T; antibodies to SP1 showed no supershift.These results suggest that FOXA2, a transcription factor known to play a role in liver development and function, 51 may mediate the differential allelic effects of rs13395911 on transcriptional activity in hepatocytes.

Regulatory element links to genes
For two signals, the caQTL variants are not located at promoters.To validate candidate effector genes for the caQTL elements spanning rs13395911 in an EFHD1 intron and rs11644920 in an LITAF intron, we used CRISPRi with lenti-sgRNAs in HepG2 cells expressing dCas9-KRAB (Figure S1; Table S3).We repressed the intronic elements spanning rs13395911 and rs11644920, and after 72 h, we measured their effects on the expression level of EFHD1, LITAF, and two of the next closest genes (Figures 3  and 4).We also measured the expression level of these genes 7 days after transduction, but repression efficiency was poor (Figure S4), suggesting that the growth of unmodified cells over 7 days moderated the effect of repression.At EFHD1, compared to control cells transduced with NTC sgRNAs, cells with sgRNAs targeting the putative enhancer spanning rs13395911 showed 24% lower EFHD1 expression (p ¼ 0.006) (Figure 3).Two additional genes residing within 50 kb of rs13395911, EIF4E2, and GIGFY showed no change in expression (p > 0.05) (Figure 3).At LITAF, compared to control cells transduced with NTC sgRNAs, cells with sgRNAs targeting the putative enhancer spanning rs11644920 showed 25% lower LITAF expression (p ¼ 0.006) (Figure 4).At this signal, targeting the enhancer also reduced the expression of nearby genes SNN and TXNDC11 by 39% (p ¼ 0.002) and 26% (p ¼ 0.005), respectively (Figure 4).These results suggest that the enhancer element spanning rs13395911 may be specific to EFHD1, whereas the enhancer spanning rs11644920 directly or indirectly affects LITAF as well as additional nearby genes.The CRISPRi experiments provided functional support of the regulatory element links to the EFHD1 and LITAF genes and detected additional target genes at the LITAF signal.

EFHD1 metabolic activity in hepatocytes
Because the enhancer element spanning rs13395911 appeared specific to EFHD1 and rs13395911 alleles disrupted binding of a transcription factor known to play a role in liver development and function, we tested the metabolic activity of EFHD1 in hepatocyte cell lines.EFHD1 regulates several mitochondrial activities in immune cells 41 ; how-ever, its role in hepatocytes is unknown.When cultured in glucose media, hepatocytes rely primarily on glycolysis to generate ATP, whereas culturing cells in galactose media requires them to produce ATP via mitochondrial oxidative phosphorylation in the tricarboxylic acid cycle pathway. 52e cultured HepG2 and SNU-761 hepatocytes in two media concentrations of glucose and galactose (5.5 and 11.1 mM) and observed that EFHD1 expression was increased 2-to 3-fold in galactose media compared to glucose media (Figure 5), suggesting that EFHD1 is induced during oxidative phosphorylation in hepatocytes.We further evaluated the role of EFHD1 on hepatic glucose and lipid homeostasis by knocking down EFHD1 using shRNA sequences.Using cells with R82% knockdown of EFHD1, we performed assays for ATP production, lipid accumulation, and glucose uptake; however, we observed no significant differences compared to control HepG2 cells (Figure S5).These results suggest that although EFHD1 is a metabolically active gene in hepatocytes and has an active enhancer element in liver, it may not act directly on these aspects of glucose or lipid metabolism to influence plasma levels of liver enzymes.

Discussion
To gain insight into the regulatory mechanisms of lipid and liver enzyme GWAS signals in noncoding regions, we studied five signals that had been colocalized with liver caQTL and eQTL. 11The colocalizations predicted causal variants and target genes, but functional studies were needed to validate that variants exhibit differential allelic effects on gene expression and function and that distal regulatory elements affect predicted genes.We showed that caQTL variants exhibited significant allelic differences in transcriptional activity at signals located near ZNF329, GPR180, and EFHD1.The alleles that exhibited higher transcriptional activity for these signals were associated with greater chromatin accessibility in the liver caQTL study 11 and with a higher expression level of their corresponding target gene. 17To identify target genes for two nonpromoter elements, we repressed the elements using CRISPRi and observed 20%-25% reduced expression of EFHD1 and LITAF, as well as nearby genes to LITAF, SNN, and TXDC11.These results validate and expand predicted molecular mechanisms at four GWAS signals.
We further evaluated molecular and biological mechanisms at EFHD1.The T allele of the caQTL lead variant rs13395911 was associated with increased liver enzymes ALT, AST, and GGT. 4 We observed that rs13395911 displayed enhancer activity in HepG2 cells and that the rs13395911-T allele exhibited 3-to 4-fold higher transcriptional activity than the rs13395911-A allele.We further observed that the rs13395911-T allele bound more strongly to the transcription factor FOXA2, a nuclear protein that helps regulate liver lipid levels, because FOXA2 overexpression in mice has been shown to improve hepatic insulin sensitivity. 53EFHD1 is a Ca 2þ -binding HNF4A target protein known to be involved in early neuronal differentiation, axonal morphogenesis, and B cell activation. 41,42,54Binding of FOXA2 to rs13395911 at this signal suggests that EFHD1 is a FOXA2 target protein and may influence glucose and lipid metabolism.Next, using CRISPRi, we showed that the enhancer element spanning rs13395911 appeared specific to EFHD1 and did not affect the expression of two other adjacent genes.Finally, we showed that although EFHD1 is a metabolically active gene in HepG2 cells and that its expression was induced by galactose, knockdown of EFHD1 did not induce direct effects on glucose uptake, ATP levels, or lipid accumulation.These results suggest that although EFHD1 has an active enhancer element acting in liver, it may not act directly through the tested processes to alter plasma levels of liver enzymes.
Our analyses and experimental approaches have limitations.Our investigation relied on an in vitro cell model, HepG2, which is a human hepatoma cell line that is highly aneuploid 55 and may not fully represent the biology and complexity of hepatocytes, the liver, or its interaction with other organs.Similarly, although we pinpointed potential regulatory variants and identified their effects on gene expression, other variants at these signals may also affect transcriptional activity.Furthermore, when targeting elements within introns, transcriptional activity may be reduced due to the steric hinderance of RNA polymerase II by the CRISPR machinery, 56 which complicates identifying the effects of the element itself on transcription.Finally, our CRISPRi assays used a mixed cell population expressing the dCas9-KRAB system, limiting the repression of promoters and suggesting that the elements may be stronger en-hancers than is shown by CRISPRi experiments.Further analysis of CRISPR-mediated knockins of specific alleles would more robustly test the effect of variants on function.
Together, the mechanisms and framework presented in this study provide a scalable approach for the identification and follow-up of additional GWAS signals.Integration of GWAS, caQTL, and eQTL signals can identify potential liver regulatory variants and their target genes, enhancing our understanding of liver biology and providing potential avenues for therapeutic intervention.

Figure 1 .
Figure 1.A plausible regulatory mechanism at a locus for LDL-C levels (A-C) rs34003091 association (A) with plasma levels of LDL-C in the multipopulation analysis from Graham et al., 3 (B) liver eQTL for ZNF329 from Etheridge et al., 17 and (C) caQTL associations from Currin et al. 11 The colors are based on the LD (r 2 ) with rs34003091 (purple diamonds) in 1000 Genomes Europeans.The x axis shows the GRCh37/hg19 genomic position in Mb and protein-coding genes, and the y axis shows variant-trait association in -log 10 (p value).(D) rs34003091 and rs35081008 (20 bp apart, red arrow) overlap a liver caQTL peak at the ZNF329 promoter and a promoter chromatin state of HepG2 cells.The region shown includes all LDL-C GWAS variants in strong LD (r 2 > 0.8) with rs34003091.(E) rs34003091 and rs35081008 overlap liver ChIP-seq regions for transcription factors and chromatin marks of active enhancers.

Figure 2 .
Figure 2. Allelic differences in transcriptional function for variants at 3 loci (A-C) Transcriptional reporter activity in HepG2 cells.Transcriptional activity is normalized to empty vector (EV).Dots represent 4-5 independent clones, error bars indicate standard deviations, and pvalues are from 2-tailed t-tests.(A) A 284-bp element spanning rs34003091-T and rs35081008-C near ZNF329 showed increased transcriptional activity compared to rs34003091-C and rs35081008-T.(B) A 331-bp element spanning rs9556404-A near GPR180 showed increased transcriptional activity compared to rs9556404-G in both forward and reverse orientations with respect to the genome.(C) A 343-bp element spanning rs13395911-T near EFHD1 showed stronger transcriptional activity compared to rs13395911-A in both orientations.(D) EMSA using HepG2 nuclear lysate and a probe spanning rs13395911 near EFHD1 showed evidence of differential allelic protein binding (gray arrow).Addition of antibodies to FOXA2 showed a supershift (black arrow, lane 9).The white arrows indicate nonallele-specific protein binding.

Figure 3 .
Figure 3. CRISPRi of the region spanning rs13395911 reduces EFHD1 expression (A) Mechanistic representation of CRISPRi for caSNP rs13395911 (red arrow), which is located between EFHD1 exon 1 and exon 2 and overlaps the caQTL accessible chromatin.The enlarged region shows 5 sgRNAs designed to span the enhancer regulatory region.(B) Gene expression measured after CRISPRi of the rs13395911 region.Compared to a pool of NTC (Control) sgRNAs, a pool of sgRNAs targeted to the enhancer led to lower expression of EFHD1, but not nearby genes EIF4E2 and GIGYF2.''None'' indicates cells with no sgRNAs.Each dot represents the mean of 3 qPCR replicates.Bars show the mean and SD of 4-6 biological replicates from different wells.p values are from 2-tailed t tests.

Figure 4 .
Figure 4. CRISPRi of the region spanning rs11644920 reduces expression of LITAF, SNN, and TXNDC11 (A) Mechanistic representation of CRISPRi for caSNP rs11644920.rs11644920 (red arrow) is located between exon 3 and exon 4 of LITAF and overlaps the caQTL accessible chromatin.The enlarged region shows 5 sgRNAs designed to span the enhancer regulatory region.(B) Gene expression measured after CRISPRi of the rs11644920 region.Compared to a pool of NTC (Control) sgRNAs, a pool of sgRNAs targeted to the enhancer led to lower expression of LITAF, as well as nearby genes SNN and TXNDC11.None indicates cells with no sgRNAs.Each dot represents the mean of 3 qPCR replicates.Bars show the mean and SD of 4-6 biological replicates from different wells.p values are from 2-tailed t tests.

Table 1 .
Liver caQTL variants selected for mechanistic studies the candidate variants selected for study.GWAS variants indicate lead variants from GWAS signals that colocalized with the liver caQTL signals.GWAS proxies and the LD between lead caQTL and GWAS variants are based on 1000 Genomes Europeans.