Genetic analysis of nonalcoholic fatty liver disease within a Caribbean–Hispanic population

Abstract We explored potential genetic risk factors implicated in nonalcoholic fatty liver disease (NAFLD) within a Caribbean–Hispanic population in New York City. A total of 316 individuals including 40 subjects with biopsy‐proven NAFLD, 24 ethnically matched non‐NAFLD controls, and a 252 ethnically mixed random sampling of Bronx County, New York were analyzed. Genotype analysis was performed to determine allelic frequencies of 74 known single‐nucleotide polymorphisms (SNPs) associated with NAFLD risk based on previous genome‐wide association study (GWAS) and candidate gene studies. Additionally, the entire coding region of PNPLA3, a gene showing the strongest association to NAFLD was subjected to Sanger sequencing. Results suggest that both rare and common DNA variations in PNPLA3 and SAMM50 may be correlated with NAFLD in this small population study, while common DNA variations in CHUK and ERLIN1, may have a protective interaction. Common SNPs in ENPP1 and ABCC2 have suggestive association with fatty liver, but with less compelling significance. In conclusion, Hispanic patients of Caribbean ancestry may have different interactions with NAFLD genetic modifiers; therefore, further investigation with a larger sample size, into this Caribbean–Hispanic population is warranted.


Introduction
Nonalcoholic fatty liver disease (NAFLD) has become the most common cause of chronic liver disease in western countries, with a prevalence of approximately 30% in the United States (Lazo et al. 2011). It is characterized by excessive fat accumulation in the liver, or hepatic steatosis. The clinical significance of steatosis is correlated with its histological severity. Nonalcoholic fatty liver disease can progress to a more severe disorder. The disease can range from simple steatosis to steatosis with inflammation (nonalcoholic steatohepatitis or NASH), fibrosis, and cirrhosis (Adams et al. 2005). A total of 2.9-20% of NAFLD patients will develop NASH dependent on the disease lead-time (Charlton et al. 2001;Lebovics and Rubin 2011). Once NASH develops, an estimated 17-49% of cases progress to cirrhosis and, once present, patients are at risk for liver decompensation and hepatocellular carcinoma (Bugianesi et al. 2002;Ratziu et al. 2002;Argo et al. 2009). Overall, a small percentage of individuals with NAFLD progress to end-stage cirrhosis or liver cancer. However, given the total number of cases of NAFLD, this still amounts to a large number of patients suffering from these significant morbidities.
Nonalcoholic fatty liver disease is frequently associated with several coexisting conditions including obesity, visceral fat, hyperlipidemia, and type 2 diabetes (Angulo 2002). Diabetes is not only a risk factor for NAFLD independent of obesity (Gupte et al. 2004), its presence also seems to compound the risk of histopathologic progression of steatosis among obese patients (Silverman et al. 1990). This logically follows as NAFLD is closely associated with insulin resistance, although the direction of this association has not been established (Marchesini et al. 1999;Angulo 2002;Paschos and Paletas 2009;Fabbrini et al. 2010).
In addition to environmental risk factors, NAFLD may have genetic risk factors. Specifically, there appears to be differences in risk depending on ethnic background. Based on recent population-based studies in the United States, the prevalence of NAFLD was 30-32% overall, with rates of 39-45% among Hispanics, 30-33% among non-Hispanic Whites, and 23-24% among non-Hispanic Blacks (Browning et al. 2004;Smits et al. 2013). Multiple studies have indeed illustrated the lower disease burden and decreased histologic severity in the African-American population (Giday et al. 2006;Kallwitz et al. 2009;Mohanty et al. 2009). Interestingly, although individuals of Asian ancestry tend to have lower rates of obesity, they have high rates of metabolic syndrome and NAFLD (prevalence ranging from 15 to 45%) (Farrell et al. 2013;Wong and Ahmed 2014). Asians may even demonstrate a higher disease severity than whites (Mohanty et al. 2009). A number of studies have demonstrated that Hispanics in the United States have both a higher disease burden and an increased disease severity than their counterparts of African-American and European ancestry (Clark et al. 2003;Neuschwander-Tetri and Caldwell 2003;Weston et al. 2005;Kallwitz et al. 2009;Sharp et al. 2009;Wagenknecht et al. 2009;Pan et al. 2011;Schneider et al. 2013). These studies investigated primarily Mexican Americans, but not Hispanics from other locations (Browning et al. 2004;Mohanty et al. 2009). This presents a complication when attempting to apply genetic and clinical principles derived from research in different Hispanic subpopulations. In the Bronx, for instance, 53.5% of the county is of Hispanic ethnicity, but over 80% are of Caribbean (Dominican and Puerto Rican) ancestry (2008-2012 American Community Survey). One study recently acknowledged the shortcoming of these NAFLD-Hispanic investigations and analyzed rates of "suspected" NAFLD among different Hispanic populations. They discovered that rates of the condition differed between ancestries, with Hispanics of Cuban, Puerto Rican, and Dominican backgrounds with lower prevalence of NAFLD when compared with those of Mexican heritage (Kallwitz et al. 2015).
Here, we performed a new study to investigate the potential genetic risk implicated in NAFLD within a population composed of Hispanics of mostly Caribbean ancestry in Bronx County. We compared the allele frequencies of genetic variations in candidate genes found for NAFLD to those in our local population.

Materials and Methods
Human subjects and phenotype data Blood or saliva samples were obtained from 40 patients with biopsy-proven NAFLD, with their informed consent (IRB # 11-06-247E). The 12 males and 28 females are all of self-identified Hispanic descent and are over the age of 18. Control samples (n = 24) were obtained from patients visiting various primary care, dermatology, and gastroenterology clinics at Montefiore Medical Center in the Bronx, New York. Subjects included 12 females and 12 males self-identified as Hispanic. Patients were excluded if they had any evidence of liver disease (abnormal abdominal imaging, AST/ALT levels, or clinical history), BMI >30, or other evidence of metabolic syndrome. If patients had isolated HTN, hypercholesterolemia, or controlled DM, they were included based on clinical judgment.
Blood samples were obtained from healthy parents of children seen at the Pediatric Genetics Clinic at Montefiore Medical Center for developmental delay, autism, or multiple congenital malformations (mostly sporadic). Their DNA was used as a proxy for a random sample of our Bronx population. Their ethnic makeup was selected to be reflective of that of Bronx County, NY:À43% Hispanic, 31% White, 15.9% Black, 5% Asian. Subjects were not screened for NAFLD or other exclusion criteria. Deidentified samples were also obtained with their informed consent (IRB #1999-201).
The Gentra Puregene Genomic DNA Purification Kit (Gentra, Minneapolis, MN) was used to purify DNA in the Molecular Cytogenetics Core, Albert Einstein College of Medicine, NY, according to standard protocols. Specimens were derived from blood (purple top EDTA tubes) or saliva (collected in Oragene OR-250 kits). Quality of DNA was visualized by agarose gel electrophoresis and quantified by nanodrop analysis, Qubit, and/or PicoGreen.

Genotype analysis
Based on prior GWAS and candidate gene studies, we compiled a list of 87 target SNPs in 52 genes associated with the development NAFLD or disease severity. Seventy-four SNPs (Table 1) were included in the study design. The other SNPs were rejected if there was a nearby SNP within 20 bases of the target SNP or the nearby SNP blocked the design of an extension primer. We subsequently performed SNP-based genotyping with SNPs using Sequenom MassArray technology.
Additionally, we sequenced the PNPLA3 (OMIM # 609567) gene including SNP rs738409 (NM_025225.2: c.444C>G) in exon 3, which is the most consistently and significantly associated variation with the development of NAFLD (Anstee and Day 2013). To detect sequence variations, we amplified the nine coding exons of PNPLA3 using PCR followed by Sanger DNA sequencing. Primers were designed specifically for each exon, based on GenBank reference sequence (Table S1). PCR amplified  data were compiled using Sequencher 4.0.1 software and compared to reference sequence NM_025225.2 (Gene Codes, Ann Arbor, MI).

Statistical analysis
Allele frequencies of the variations were estimated by the gene-counting method. Agreement with the Hardy-Weinberg equilibrium was tested by the chi-square test. Allele frequencies between each group were also compared using the chi-square test. PLINK software was used to evaluate all Sequenom results. All other analyses were performed in Microsoft Excel.

Target variation analysis
Sequenom genotyping and PNPLA3 sequencing were used to identify allele frequencies of the target SNPs within NAFLD cases, controls, and our sample Bronx population. Additional minor allele frequencies (MAFs) were determined through 1000 Genomes Project Phase 1 report of the total population as well as subpopulation of Puerto Ricans from Puerto Rico. Complete results are included in Table S2. Nine variations are listed in Table 2 based on significant P-values (≤0.05) across multiple avenues of comparison. The SNPs in PNPLA3 and SAMM50 (OMIM # 612058) (rs738409, rs2896019; NM_025225.2:c.979+542T>G, rs376 1472; NM_015380.4:c.329A>G) were previously identified in GWAS studies and generally had increased frequencies in the NAFLD population compared with our controls, the sample Bronx population, and expected population frequencies. Increased MAF of the alternative allele of SNPs rs738409 and rs2896019 in PNPLA3 was found in our Bronx population sample as compared to the MAF recorded for the Puerto Rican population. Similarly, the MAF of the alternative allele was increased significantly in disease-related SNPs, rs1044498 ((NM_006208.2: c.517A>C) and rs17222723 (NM_000392.4:c.3563T>A), in ENPP1 (OMIM # 173335) and ABCC2 (OMIM # 601107). The MAF of the alternative allele of rs17222723 and rs1044498 were statistically significant (P < 0.05) in both the NAFLD cases and Bronx population than the expected based on population data, but were not significant when comparing NAFLD cases versus Bronx population. The SNP rs1044498 (ENPP1) was most significant (P = 0.02) when comparing the Bronx population with the Puerto Rican population.
Common SNP variations in an intron of CHUK (OMIM # 600664) (rs11597086; (NM_001278.3:c.1974+36T>G)) and exon of ERLIN1 (OMIM # 611604) (rs2862954; (NM_001100626.1:c.871A>G)) occurred less frequently in both the NAFLD cases and the sample Bronx population than would be expected in a random Puerto Rican population. The intronic CHUK, SNP variant, rs11591741, also occurred less frequently in our Bronx population than would be expected in a random Puerto Rican population (31 vs. 43%, P = 0.03); however, it occurred more often than would be expected in the general population (31 vs. 23%, P = 0.004).

Discussion
NAFLD is a highly prevalent disease that is the most common cause of liver disease worldwide. With its connection to obesity and metabolic syndrome, NAFLD and its more severe form, NASH, are projected to continually increase as a public health burden. Although NAFLD occurs with increased incidence and severity in the Mexican Hispanic population, the principal aim of this study was to explore previously discovered NAFLD variants in our largely Caribbean-Hispanic population. The NAFLD patients seen at Montefiore Medical Center are of mostly Dominican or Puerto Rican ancestry, populations highly affected with fatty liver disease and to our knowledge never studied in this context. We performed liver biopsies as part of the clinical evaluation of these patients. Additionally, our analysis benefits from the inclusion of previously discovered and/or validated genes and variants related to fatty liver disease by providing more power. This may enhance any implications made in the context  of our limited number of samples. Indeed, the greatest weakness in our study is the relatively small number of patients when examining a common disease. Therefore, the purpose of our discussion is to provide a framework for future analyses. All results of our experiments are suggestive, but not intended to be conclusive. As discussed previously, NAFLD is a complex disease with a multifactorial etiology associated with diet, obesity, and a variety of comorbidities related to insulin resistance (Marchesini et al. 1999). Although environmental factors are important in determining risk for the disease, evidence from familial and twin studies support the assumption that genetics provide an important modulator of NAFLD development and disease progression (Brouwers et al. 2006;Makkonen et al. 2009;Schwimmer et al. 2009). The most studied and replicated variant is rs738409 (NM_025225.2: c.444C>G) in PNPLA3. This common missense variant has demonstrated increased risk for the development of NAFLD independent of diabetes or obesity and is also significantly associated with degree of histological severity (Rotman et al. 2010;Speliotes et al. 2010;Valenti et al. 2010a,b). The strongest effect of the variant appears to exist within the Hispanic population versus African-American or European-American individuals (Romeo et al. 2008).
The overall strength of the association between this polymorphism and NAFLD has recently been confirmed in a genetic meta-analysis (Sookoian and Pirola 2011).
Although the missense SNP, rs738409, has been repeatedly evaluated and tested, to our knowledge the entire coding region of PNPLA3 had never been sequenced in the Hispanic population with respect to NAFLD. Our study demonstrated that the minor, G allele is more common in our Caribbean-Hispanic patients with NAFLD than in any other group analyzed. This included ethnically matched controls (OR 2.95, P = 0.003), the sample Bronx population (OR 3.01, P = 4.4E-7), the general population (OR 1.97, P = 0.0001), and even the average Puerto Rican population (OR 1.56, P = 0.05).
Mutations in PNPLA3 provide a possible genetic basis for the underlying mechanisms in the genesis of NAFLD. The PNPLA3 gene is located on chromosome 22q13.3 and encodes for a membrane-bound protein mediating triacylglycerol hydrolysis in adipocytes. Its expression is highest in human liver tissue (Huang et al. 2010) and is induced during feeding and insulin resistance by fatty acids and other regulators of lipogenesis (Huang et al. 2010;Dongiovanni et al. 2013). The rs738409 missense variation specifically is assumed to promote triglyceride accumulation through relative inhibition of triglyceride hydrolysis ). An understanding of the theoretical pathophysiology behind NAFLD illustrates how PNPLA3's function is consistent with a theoretical role in the progression of the disease. Although the exact pathogenesis of NAFLD is still under debate, there are a number of mechanisms that are clearly involved. These include insulin resistance, free fatty acid flux, endoplasmic reticulum stress, oxidative stress, and inflammation (Yoon and Cha 2014). Simplistically, steatosis in the liver develops when supply of fatty acids to the liver exceed the demand in requirements for mitochondrial oxidation and synthesis of phospholipids, triglycerides, and cholesterol (Lall et al. 2008). Triglyceride accumulation results from either lipid uptake in the liver or de novo synthesis in the setting of excess carbohydrates (Kawano and Cohen 2013). Insulin resistance has repeatedly been implicated as an important cause of lipid accumulation in the liver (Marchesini et al. 1999;Sanyal et al. 2001;Chitturi et al. 2002;Pagano et al. 2002). Once the liver is overrun with lipid accumulation, the mitochondria attempt to remove the fatty acids through oxidation. However, this process can inadvertently cause oxidative stress and mitochondrial dysfunction through excessive production of reactive oxygen species (ROS) (Rolo et al. 2012;Yoon and Cha 2014). This hepatocellular injury is further exacerbated by the secretion of inflammatory cytokines (e.g., TNF-a, IL-6, and NO) that are induced through the presence of adipose tissue in the liver and basal insulin resistance (Carter-Kent et al. 2008;Hijona et al. 2010;Odegaard and Chawla 2011;Yoon and Cha 2014). This proinflammatory state and resultant fibrosis/necrosis define the histopathological spectrum of NAFLD.
Many of the genes identified in our study as having variations with multiple significant results have functions that fit in with the pathogenesis of NAFLD and may logically influence disease susceptibility or progression. These particularly include SAMM50, ENPP1, and ABCC2. ENPP1's function can be directly linked to hepatic steatosis as it codes for a membrane glycoprotein functioning to inhibit insulin signaling. The rs1044498, SNP variation causes a gain-of-function mutation that causes overexpression in peripheral insulin target tissues and is associated with human insulin resistance (Prudente and Trischitta 2006). If SAMM50 is involved in NAFLD, it is likely related to its role in mitochondrial function. SAMM50 encodes a component of the SAM assembly complex, which helps integrate b-barrel proteins into the outer mitochondrial membrane. Any mitochondrial dysfunction and resultant decrease in removal of reactive oxygen species (ROS) as a result of mutations in SAMM50 is consistent with biochemical rationale for the importance of SAMM50 in NAFLD (Kitamoto et al. 2013). ABCC2 has a function that is more removed from a direct correlation with NAFLD. It encodes a protein expressed in the apical area of the hepatocyte and functions in biliary transport, likely critical to the elimination of conjugates of many toxins from hepatocytes into bile (Nies and Keppler 2007) and likely predisposing liver toward injury from excessive adipose tissue (Sookoian et al. 2009).
There is one gene cluster, ERLIN1-CHUK-CWF19L1, which revealed associations with SNPs that appear to confer protection in our study. The relation of these genes to NAFLD is theoretically logical as ERLIN1 comprises a component of lipid rafts (Browman et al. 2006) and CHUK proteins modulate NF-kB activation of several genes involved in insulin resistance (Yuan et al. 2008). For the SNPs rs11597086 and rs2862954, the rate of polymorphism in our sample Bronx population was lower than in the average Puerto Rican population (OR 0.54, P = 0.003; OR 0.54, P = 1.5E-3). In our study, these variations also occur less frequently in NAFLD cases than would be expected in a random Puerto Rican population (OR 0.61, P = 0.09, OR 0.60, P = 0.07). The fact that these polymorphisms only seem to confer protection when compared with the Puerto Rican community suggest that there is some other genetic modifier in the Hispanic subpopulation that interacts with these genes to protect patients from the development of hepatic steatosis. This phenomenon is interesting and warrants further investigation.
In conclusion, this study implies significant interactions of variants from PNPLA3, ERLIN1-CHUK, SAMM50, ENPP1, and ABCC2 with NAFLD in Hispanics from a majority Caribbean ancestry. An interaction of these genes with NAFLD is pathophysiologically plausible and the extent of impact of these variations in NAFLD generally and in sub-Hispanic populations deserves further analysis.