Introduction

Recently, nonalcoholic fatty liver disease (NAFLD) was recognized as an important health concern.1, 2 NAFLD includes simple steatosis, nonalcoholic steatohepatitis (NASH), fibrosis/cirrhosis and hepatocellular carcinoma. The frequency of patients presenting with NAFLD has increased gradually in Japan in proportion to the increase in the population with metabolic syndrome.3 NAFLD is observed in 20–30% of the population in Japan and 1–3% of these individuals are considered to have NASH, similar to American and European populations.3, 4

In addition to environmental factors, genetic factors are important in the development of NAFLD.5 In order to elucidate these genetic factors, we previously investigated candidate genes and found that variations in the peroxisome proliferator-activated receptor γ coactivator 1α (PPARGC1A), angiotensin II type 1 receptor (AGTR1), and nitric oxide synthase 2 (inducible) (NOS2) genes are associated with NAFLD in Japanese individuals.6, 7, 8 Genome-wide association studies have revealed that single-nucleotide polymorphisms (SNPs) in the patatin-like phospholipase domain containing 3 (PNPLA3) influence NAFLD and liver enzyme levels in the plasma.9, 10, 11, 12 We also reported that the risk allele (G-allele) of rs738409 in PNPLA3 is strongly associated with NAFLD, as well as with increases in aspartate aminotransferase (AST), alanine aminotransferase (ALT) and fibrosis stage in patients with NAFLD in the Japanese population.13

Recently, we performed a GWAS and found that polymorphisms in the SAMM50 sorting and assembly machinery component (SAMM50) and parvin-β (PARVB) genes, in addition to those in PNPLA3 were associated with the development and progression of NAFLD.14 Variations used for GWAS were limited and at least 10% of SNPs were monomorphic or rare SNPs in the Japanese population. Many variations in the NAFLD susceptibility genomic region have not been investigated yet. Here, we performed targeted resequencing to identify all common variations in the entire NAFLD susceptibility genomic region. We constructed a fine linkage disequilibrium (LD) map of SNPs and insertion/deletions in the region and analyzed their association with the development and severity of NAFLD.

Materials and methods

Study subjects

The entire study was conducted in accordance with the guidelines of the Declaration of Helsinki. Written informed consent was obtained from each subject, and the protocol was approved by the ethics committee of Kyoto University, Yokohama City University, Hiroshima University and Kurume University.

Control subjects (n=1012) included Japanese volunteers who had undergone medical examination for common disease screening. Japanese patients with NAFLD who underwent liver biopsy (488 with NASH and 52 with simple steatosis) were enrolled. Patients with the following diseases were excluded from the study: viral hepatitis (hepatitis B and C, Epstein–Barr virus infection), autoimmune hepatitis, primary biliary cirrhosis, sclerosing cholangitis, hemochromatosis, α1-antitrypsin deficiency, Wilson’s disease, drug-induced hepatitis and alcoholic hepatitis (present or past daily consumption of more than 20 g alcohol per day). None of the patients showed clinical evidence of hepatic decompensation, such as hepatic encephalopathy, ascites, variceal bleeding or a serum bilirubin level greater than twofold the normal upper limit.

Liver biopsy tissues were stained with hematoxylin and eosin, reticulin and Masson’s trichrome stain. The histological criterion for NAFLD diagnosis is macrovesicular fatty change in hepatocytes with displacement of the nucleus toward the cell edge.15 Patients who had more than 5% of hepatocytes affected by macrovesicular steatosis were diagnosed as having either steatosis or NASH. The minimal criteria for the diagnosis of NASH were the presence of >5% macrovesicular steatosis, inflammation and liver cell ballooning, typically with predominantly centrilobular (acinar zone 3) distribution.16, 17 The degree of steatosis was graded as follows based on the percentage of hepatocytes containing macrovesicular fat droplets: grade 0, no steatosis; grade 1, <33% hepatocytes containing macrovesicular fat droplets; grade 2, 33–66% of hepatocytes containing macrovesicular fat droplets; and grade 3, >66% of hepatocytes containing macrovesicular fat droplets.18 The hepatitis activity (necroinflammatory grade) was also determined on the basis of the composite NAFLD activity score: the unweighted sum of the scores for steatosis, lobular inflammation and hepatocellular ballooning (ranging from 0 to 8).19 Fibrosis severity was scored according to the method of Brunt18 and was expressed on a 4-point scale as follows: 0, none; 1, perivenular and/or perisinusoidal fibrosis in zone 3; 2, combined pericellular portal fibrosis; 3, septal/bridging fibrosis; and 4, cirrhosis.

Clinical and laboratory evaluation

The weight and height of the patients were measured using a calibrated scale after removing shoes and heavy clothing. Venous blood samples were obtained from subjects after overnight fasting (12 h) to measure plasma glucose, hemoglobin A1c, total cholesterol, high-density lipoprotein cholesterol, triglycerides, serum AST and ALT levels. All laboratory biochemical parameters were measured using conventional methods.

Targeted resequencing

Genomic DNA was extracted using Genomix (Talent Srl, Trieste, Italy), from blood samples collected from each subject. Genomic DNA of 28 randomly selected NAFLD patients was used for targeted resequencing. Clinical information of these 28 patients is shown in Supplementary Table 1. The genomic region from 44,317,888 to 44,425,903 on chromosome 22, which includes PNPLA3, SAMM50 and PARVB, was amplified using region-specific primers with a long-range PCR approach (Supplementary Table 2). Amplification was performed using the GeneAmp 9700 PCR System (Life Technologies, Carlsbad, CA, USA), with 50 ng of genomic DNA, 1 × PCR buffer (TOYOBO, Osaka, Japan), 0.4 mM of each dNTP, 0.3 μM of each primer and 0.02 U of KOD FX Neo (TOYOBO, Osaka, Japan) in a 14-μl reaction volume. After an initial denaturation at 94 °C for 2 min, PCR was performed with 35 cycles, each consisting of 98 °C for 10 s and 68 °C for 5–8 min, depending on the length of the amplicons. The PCR extension time is shown in Supplementary Table 2. All 16 PCR amplicons were confirmed as a single peak (Supplementary Figure 1), and the molar concentration was measured using an Agilent DNA 12000 Kit and 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA); the amplicons were then pooled at equal concentration for all individuals.

Pooled PCR products were shared using a Covaris E210 (COVARIS, Woburn, MA, USA) under the following conditions: 10% duty cycle, intensity of 5.0, 200 cycles per burst, 40-s duration and frequency sweeping mode. Each sample was dual indexed and DNA libraries were prepared using the TruSeq DNA Sample Preparation kit v2 (Illumina, San Diego, CA, USA), according to the manufacturer’s protocol. Sequencing was performed in a MiSeq system (Illumina) using 150-bp paired-end reads. We aligned the paired-end reads to the human genome (UCSC hg19) as a reference, using BWA (version 0.5.9rc1; http://www.bio-bwa.sourceforge.net/).20 For sequence data conversion, sorting and indexing, we used SAMtools (version 0.1.17; http://www.samtools.sourceforge.net/) and Picard (version 1.72; http://www.picard.sourceforge.net).21 The variants, including insertion/deletions and SNPs, were determined using GATK (version 1.6-5-g557da77; http://www.broadinstitute.org/gatk/).22 ANNOVAR (version 2012-05-25; http://www.openbioinformatics.org/annovar/) was used to appropriately annotate genetic variants obtained.23 The variations were confirmed using IGV (Supplementary Figure 2) (version 1.5.65, http://www.broadinstitute.org/igv/).24, 25

Fine LD mapping of the PNPLA3, SAMM50 and PARVB loci

Invader probes (Third Wave Technologies, Madison, WI, USA) were constructed for 192 SNPs and 9 insertion/deletions located on chromosome 22:44,321,410 to 44,417,970, which had minor allele frequencies >0.05, as estimated from the results of targeted resequencing. An invader probe for one insertion/deletion (rs67877195) could not be constructed. SNPs and insertion/deletions were genotyped using Invader assays, as previously described.26 Among the 200 variations, 173 SNPs and 7 insertion/deletions were successfully genotyped. The success rates of the 180 Invader assays were >99.0% and the concordance rate of these results with those of sequencing was >99.4%.

Statistical analysis

LD was drawn using HaploView (http://www.broad.mit.edu/mpg/haploview/).27 We categorized the genotypes as 0, 1 or 2, depending on the number of copies of risk alleles present. Odds ratios and P-values, adjusted for age, sex, body mass index (BMI) and the presence of type 2 diabetes mellitus, were calculated using multiple logistic regression analysis. Multiple linear regression analyses were performed to test the independent effect per allele of each SNP on biochemical traits and histological parameters, accounting for effects of the other variables (that is, age, sex, BMI and the presence of type 2 diabetes mellitus). BMI, AST and ALT values were logarithmically transformed before performing multiple linear and logistic regression analyses. Statistical analyses were performed using PLINK 1.07 (http://pngu.mgh.harvard.edu/purcell/plink)28 and R software (http://www.r-project.org/). P-values less than 3.0 × 10−4 (0.05/169) was considered to be significant.

Results

Targeted resequencing

GWAS indicated that a strong NAFLD susceptibility locus is present in the region from 4.3 kb downstream of the PNPLA3 (NM_025225.2) transcription start site to 7.5 kb downstream of the PARVB isoform a (NM_001003828.2) transcription start site.11, 12, 14 We amplified the 100-kb genomic region, from 1.7 kb upstream of PNPLA3 to 30 kb downstream of the PARVB isoform a transcription start site, using long-range PCR (3.8–13.6 kb, Supplementary Table 2). Sixteen PCR products covered the entire genomic region.

PCR amplicons prepared from 28 NAFLD patients were sequenced using indexed multiplex next-generation sequencing. The average depth of the individual sequencing ranged from 387 to 652. An example of this sequencing is shown in Supplementary Figure 1. All observed variations are listed in Supplementary Table 3. A total of 329 variations were found and most of these (325 variations) were already registered in the dbSNP database. Insertion/deletions accounted for 16 variations, and SNPs for 313. Four novel SNPs were found in this study (ss831884188, ss831884189, ss831884190 and ss831884191). No clear genomic structural variations (>50 nucleotides in length) were observed. A 13-nucleotide insertion/deletion (rs140963094) was the longest variation.

Fine variation LD mapping

According to the HapMap database, SNPs from rs58002102 (chromosome 22:44,321,410) to rs135114 (chromosome 22:44,417,970) were sufficient to produce a fine LD map in the NAFLD susceptibility genomic region. There were 280 variations in this region (Supplementary Table 3), including 192 SNPs and 9 insertion/deletions with minor allele frequencies >0.05 in the 28 NAFLD patients. The latter 201 variations with minor allele frequencies >0.05 were genotyped in 540 NAFLD patients and 1012 control subjects using Invader assays; 173 SNPs and 7 insertion/deletions (rs5845621, rs71218095, rs71313378, rs140963094, rs10656207, rs76409096 and rs34505405) were successfully identified. Genotyped variations and their allele frequencies are shown in Supplementary Tables 3 and 4.

We used 162 SNPs and 7 insertion/deletions that had minor allele frequencies >0.10 for further analysis. Fine LD mapping using 169 variation data derived from 540 NAFLD patients is shown in Figure 1 and r2 values in control and NAFLD subjects are given in Supplementary Figure 2. HaploView analysis showed that four LD blocks existed in the NAFLD susceptibility genomic region in this population: a 1-kb LD block 1 (rs738407–rs2006943), a 21-kb LD block 2 (rs139051–rs13054885), a 48-kb LD block 3 (rs7289329–rs1007863) and a 2-kb LD block 4 (rs5764455–rs6006611). LD block 1 and block 2 occurred within PNPLA3, block 3 within SAMM50 and block 4 within PARVB (Figure 1). LD blocks were the same in the NAFLD patients and control subjects (Supplementary Figure 2). Seven insertion/deletions were in LD with the SNPs.

Figure 1
figure 1

Fine linkage disequilibrium (LD) mapping around PNPLA3, SAMM50 and PARVB. LD coefficients (r2) between every pair of variations were calculated. LD blocks are shown in red triangles; genomic structure is shown on top.

Association of variations with severity of NAFLD

First, we examined the association of the above 169 variations with simple steatosis and NASH. Clinical characteristics of the simple steatosis and NASH patients are shown in Table 1. We performed multiple logistic regression analysis using genotypes, age, sex, BMI and the presence of type 2 diabetes as independent variables. Two SNPs, namely, rs6006610 (P=6.1 × 10−5) and rs6006611 (P=7.1 × 10−6) in LD block 4, and rs2006943 (P=5.6 × 10−5) in LD block 1, were significantly associated with NASH as compared with simple steatosis (Figure 2, Supplementary Table 4). Relatively high odds ratios were observed: rs6006611, 2.68 (1.74–4.12); rs6006610, 2.37 (1.55–3.62); rs2006943, 2.54 (1.62–4.00) (OR (95% confidence interval)). As the sample size of simple steatosis is small (n=52), we compared 540 NAFLD patients and 488 NASH patients with 1012 control subjects. As reported previously, most of the variations in LD blocks 1–4 were significantly associated with NAFLD (Figure 2, Supplementary Table 4). Stronger association and higher odds ratios between NASH and control individuals were observed in LD block 1–4, especially in block 4 (Figure 2). The risk allele frequencies of these SNPs were highest in the NASH group, and those of the simple steatosis and control group were very similar (Supplementary Table 4). No associations were observed for the simple steatosis and control groups (P>0.02). These data suggested that the variations in LD block 4 are more strongly associated with NASH, rather than with NAFLD and simple steatosis.

Table 1 Clinical characteristics of subjects
Figure 2
figure 2

Regional plots of the −log10(P-value) for the association of variations. Variations are plotted by their position on chromosome 22 against their association with nonalcoholic steatohepatitis (NASH) compared with simple steatosis, nonalcoholic fatty liver disease (NAFLD) to control and NASH to control. Data were derived by logistic regression analysis, adjusted for age, sex, logarithmically transformed body mass index and the presence of type 2 diabetes. The positions of genes as well as the direction of transcription are shown above the plots. Black, red, blue and green closed circles are variations in linkage disequilibrium (LD) block 1, 2, 3 and 4, respectively. Gray closed circles are variations in other LD blocks.

Next, we examined the association of the 169 variations with histological phenotypes. Linear regression analysis, adjusted for age, sex, BMI and the presence of type 2 diabetes, revealed that no variations were significantly associated with lobular inflammation (Figure 3, Supplementary Table 5). Only one SNP (rs12484700) was significantly associated with steatosis grade (P=2.2 × 10−4). Two SNPs (rs6006610, P=1.9 × 10−4; rs6006611, P=4.4 × 10−5) in LD block 4 were significantly associated with hepatocyte ballooning. Many variations in LD blocks 2, 3 and 4 were associated with a higher NAFLD activity score (Figure 3, Supplementary Table 5). The most significant variation was rs6006611 (P=3.4 × 10−6) in LD block 4. Only rs734561 (P=2.6 × 10−4) and rs2006943 (P=1.9 × 10−4) in LD block 1 were significantly associated with a higher fibrosis stage (Figure 3, Supplementary Table 5).

Figure 3
figure 3

Regional plot for histological scores. Data were derived by linear regression analysis for individual histological scores for steatosis grade, lobular inflammation, hepatocyte ballooning, nonalcoholic fatty liver disease activity score (NAS) and fibrosis stage. Each phenotype was adjusted for age, sex, logarithmically transformed body mass index and the presence of type 2 diabetes. The positions of genes as well as the direction of transcription are shown above the plots. Black, red, blue and green closed circles are variations in LD block 1, 2, 3 and 4, respectively. Gray closed circles are variations in other linkage disequilibrium (LD) blocks.

Finally, we investigated the association of these variations with AST and ALT levels. We observed that many of the variations in LD block 2 were significantly associated with both AST and ALT levels (Figure 4, Supplementary Table 5). Many variations in LD blocks 1, 3 and 4 were also significantly associated with AST and ALT levels; however, these associations were much stronger with variations in LD block 2.

Figure 4
figure 4

Regional plot for aspartate aminotransferase (AST) and alanine aminotransferase (ALT). Data were derived by linear regression analysis. Values of AST and ALT levels were logarithmically transformed. Each phenotype was adjusted for age, gender, logarithmically transformed body mass index and the presence of type 2 diabetes. The positions of genes as well as the direction of transcription are shown above the plots. Black, red, blue and green closed circles are variations in linkage disequilibrium (LD) block 1, 2, 3 and 4, respectively. Gray closed circles are variations in other LD blocks.

Discussion

After the first report of an association of PNPLA3 rs738409 with NAFLD,9 many replication studies were performed in various populations,10, 11, 12, 13, 14, 29, 30 verifying the importance of rs738409 in the development of NAFLD. The LD block containing rs738409 spans 80 kb in the Japanese population. Many variations in the LD block containing rs738409 are registered in the dbSNP database; however, many of these were invalid, and no genomic structural variations were reported. Therefore, we performed targeted resequencing to identify and confirm variations in this region. We did not find any genomic structural variations in this region. For sequencing, only 28 NAFLD subjects were used; it is therefore possible that genomic structural variations may occur in this region at a low frequency. Using a combination of targeted resequencing and genotyping methods, we determined the frequencies and the associations of most common variations in this region, including insertion/deletions, with the severity of NAFLD. This approach would be useful for investigating other common disease susceptibility loci in detail.

Fine LD mapping revealed that 4 LD blocks covered the NAFLD susceptibility locus and that PNPLA3 was contained in LD blocks 1 and 2, SAMM50 in LD block 3 and PARVB in LD block 4. As LD blocks were separated clearly by genes, the genetic effect of each gene on the development and progression of NAFLD could be analyzed. In this study, we showed that many variations in PNPLA3 (LD blocks 1 and 2), as well as rs738409 in LD block 2, were significantly associated with NASH, and with the severity of NAFLD (NAFLD activity score, AST and ALT). These associations were stronger with the 5′-region of LD block 2. Fibrosis was significantly associated with variations in LD block 1, which did not contain rs738409. The variant, rs738409, is a C (non-risk allele) to G (risk allele) substitution that changes codon 148 from isoleucine (I) to methionine (M). Interestingly, PNPLA3-deficient mice did not show a fatty liver,31, 32 but hepatic overexpression of PNPLA3I148M led to development of a fatty liver, although overexpression of wild-type PNPLA3 in mouse liver did not.33 Our study and other studies in mice suggest that variations in LD with rs738409 and/or variations in LD block 1 of PNPLA3 could increase the transcriptional level of mRNA containing the risk allele of rs738409, leading to the development of NAFLD. LD block 2 contains the insertion/deletion rs71218095 that is located in intron 5 of PNPLA3, near exon 6 (pairwise LD statistics r2=0.93 with rs738409). The effect of the insertion/deletion would affect transcription of mRNA more strongly.

Overexpression of PNPLA3I148M in mice led to the development of fatty liver, but not to NASH and metabolic disorders,33 suggesting the contribution of other genes to the latter conditions. In our study, the strongest associations were observed between variations in PARVB (LD block 4) and NASH, and histological severity (NAFLD activity score). PARVB has at least four transcript variants; LD block 4 is located in the 2nd intron of the longest transcript variant of PARVB, corresponding to the 5′-flanking region of the shorter transcript variants. PARVB encodes parvin-β, which forms part of an integrin-linked kinase–pinch–parvin complex that transmits signals from integrin to Akt/protein kinase B.34 Overexpression of parvin-β increases mRNA expression levels, phosphorylation at serine 82 and activity of the peroxisome proliferator-activated receptor γ, leading to a concomitant increase in lipogenic gene expression.35 Another report has indicated that overexpression of parvin-β promotes apoptosis.36 Increased lipogenesis and apoptosis are considered to underlie fibrosis in NAFLD;37, 38 therefore, variations in PARVB (LD block 4) could increase the transcriptional level of parvin-β, leading to increased lipogenesis and apoptosis, resulting in the progression of fibrosis.

In conclusion, both our previous GWAS and the present study indicate that PNPLA3 is most important in the development of simple steatosis. We also showed here that PARVB, in addition to PNPLA3, has an important role in the progression of simple steatosis to NASH. The limitation of this study was a small sample size of individuals with simple steatosis; therefore, further studies would be necessary. Nevertheless, the approach used here of a combination of targeted resequencing and genotyping methods could be applied for the detailed investigation of other common disease susceptibility loci.