Association of NAFLD with FGF21 Polygenic Hazard Score, and Its Interaction with Protein Intake Level in Korean Adults

Fibroblast growth factor 21 (FGF21) is a hormone that participates in the regulation of energy homeostasis and is induced by dietary protein restriction. Preclinical studies have suggested that FGF21 induction exerts a protective effect against non-alcoholic fatty liver disease (NAFLD), while human studies have revealed elevated levels of and potential resistance to FGF21 in patients with NAFLD. However, whether the FGF21 pathway also contributes to NAFLD risk at the genetic level remains uncertain. A few attempts to investigate the impact of individual genetic variants at the loci encoding FGF21 and its receptors on NAFLD risk have failed to establish a clear association due to a limited effect size. Therefore, this study aimed to (1) develop a polygenic hazard score (PHS) for FGF21-related loci that are associated with NAFLD risk and (2) investigate the effect of its interaction with protein intake level on NAFLD risk. Data on 3501 participants of the Korean Genome Epidemiology Study (Ansan–Ansung) were analyzed. Eight single-nucleotide polymorphisms of fibroblast growth factor receptors and beta-klotho were selected for PHS determination using forward stepwise analysis. The association between the PHS and NAFLD was validated (p-trend: 0.0171 for men and <0.0001 for women). Moreover, the association was significantly modulated by the protein intake level in all participants as well as women (p-interaction = 0.0189 and 0.0131, respectively) but not in men. In particular, the women with the lowest PHS values and a protein intake lower than the recommended nutrient intake (RNI) exhibited a greater NAFLD risk (HR = 2.021, p-trend = 0.0016) than those with an intake equal to or greater than the RNI; however, those with higher PHS values had a high risk, regardless of protein intake level. These findings demonstrate the contribution of FGF21-related genetic variants and restricted protein intake to NAFLD incidence.


Introduction
Non-alcoholic fatty liver disease (NAFLD) comprises a range of liver conditions caused by extra fat buildup in the liver without significant consumption of alcohol or lipid-causing drugs, viral infection, and/or inherited genetic diseases [1]. Its prevalence has been gradually increasing, currently affecting approximately 30% of the global adult population [2]. The prevention and treatment of NAFLD is an important public health concern, since NAFLD is closely associated with the risks of other metabolic disorders, including obesity, hyperlipidemia, and diabetes, and potentially leads to severe liver damage, such as cirrhosis or hepatocellular carcinoma [3]. Considering the tight association between metabolic dysregulation and fatty liver diseases, an attempt has recently been made to rename NAFLD to metabolic associated fatty liver disease (MAFLD) [4,5]. NAFLD often develops in people with obesity and diabetes; nevertheless, approximately 40% of patients with NAFLD in Korea are non-obese or lean [6]. Therefore, NAFLD is possibly not only

Study Participants
This study used data from the prospective, population-based Ansan and Ansung cohorts, which are part of the Korean Genome Epidemiology Study [44]. This data set was provided by the Center for Genome Science, National Institute of Health, Korea Disease Control and Prevention Agency, Republic of Korea. The Ansan-Ansung cohort study is a longitudinal study that investigates the genetic and environmental causes of common metabolic and cardiovascular diseases. The cohort survey was performed biennially until 2012. As the third survey (2005)(2006) provided the most detailed dietary information of the participants, we used the third survey as a baseline and included data up to the sixth survey (2011-2012) for analysis.
Nutrients 2023, 15, x FOR PEER REVIEW 3 of 19 modulated by protein intake level, which is a strong FGF21-pathway stimulus. To investigate these associations, we (1) developed a polygenic hazard score (PHS) via the discovery and combination of multiple-loci SNPs related to the FGF21 pathway based on a longitudinal cohort of the Korean population, (2) analyzed the risk of NAFLD incidence based on the PHS, and (3) explored the possible modifying effect of protein intake.

Study Participants
This study used data from the prospective, population-based Ansan and Ansung cohorts, which are part of the Korean Genome Epidemiology Study [44]. This data set was provided by the Center for Genome Science, National Institute of Health, Korea Disease Control and Prevention Agency, Republic of Korea. The Ansan-Ansung cohort study is a longitudinal study that investigates the genetic and environmental causes of common metabolic and cardiovascular diseases. The cohort survey was performed biennially until 2012. As the third survey (2005)(2006) provided the most detailed dietary information of the participants, we used the third survey as a baseline and included data up to the sixth survey (2011-2012) for analysis.
During the baseline survey, 7515 people aged 40-69 years living in Ansan and Ansung were enrolled in the study conducted by Korea University and Ajou University. All study participants were informed, and they provided informed consent prior to commencing the study. A total of 4014 participants were excluded before the investigation for the following reasons: (1) lack of information regarding their genetic information (n = 840), clinical data (n = 767), nutritional data (n = 20), hepatitis and diabetes status (n = 6), and NAFLD liver fat score (NLFS)-based diagnosis (n = 2) in the third survey (2005)(2006); (2) hepatitis diagnosis (n = 78); (3) alcohol consumption >140 g per week (n = 919); (4) the presence of cancer, including liver cancer (n = 66); (5) NLFS ≥-0.640 (n = 1080) in the third survey (2005)(2006); (6) total caloric consumption <500 or >5000 kcal/day (n = 16); and (7) lack of visit during the follow-up period (n = 288). Finally, 3501 participants were included in this study, as shown in Figure 1. The study was approved by the Institutional Review Board of Ewha Womans University, Seoul, Republic of Korea (IRB approval number: ewha-202105-0003-01).    calculated using the equation: weight (kg) divided by the square of the height (m 2 ). Waist circumference was measured at the middle area between the ribs and iliac crest, and the average of triplicate measurements was determined. Blood pressure was measured once from the right arm in a sitting position.
Smoking, diabetes mellitus (DM), and educational status variables were collected as well as frequency and amount of alcohol consumption. Smoking was classified into "never-smoker", "former smoker", and "current smoker". Participants diagnosed with DM were either clinically diagnosed or had Hba1c and plasma glucose levels ≥6.5% and ≥200 mg/dL, respectively, after a 2 h oral glucose tolerance test or a fasting plasma blood glucose level ≥126 mg/dL (https://www.diabetes.or.kr/pro/; accessed on 1 September 2021). Educational status was categorized into "elementary school or below", "middle school", "high school", and "college or above". Physical activity was evaluated as follows: 0 metabolic equivalent (MET) for no activity, 1.5 MET for motionless activity, 3 MET for light activity, 5 MET for intermediate activity, and 7 MET for strong activity [45].

Dietary Assessment
Dietary intake was evaluated using a semi-quantitative food frequency questionnaire (FFQ) containing 106 food items [46]. In the FFQ, participants recorded the average frequency and amount of intake over the past year. Caloric, protein, carbohydrate, and fat intakes were calculated as percentages and amounts in grams per day using previously calculated individual nutrient intakes. To determine the criteria for low and high protein intakes, protein-intake level was divided into two and three groups based on the Korean recommended nutrient intake (RNI, from the 2020 Korean Dietary Reference Intake [47]) and intake tertiles, respectively.

NAFLD Diagnosis Using the NLFS
Since results from the liver biopsy did not exist in the Ansan-Ansung study data, NAFLD diagnostic criteria was used instead. Among the various NAFLD diagnostic criteria, the NLFS was utilized for NAFLD diagnosis [48]. The following equation was applied: An NLFS value ≥ -0.640 indicated NAFLD.

PHS Development and Calculation for NAFLD
QC and the PHS were calculated using Plink version 1.9 (https://zzz.bwh.harvard. edu/plink/index.shtml; accessed on 1 September 2021). Before calculating the PHS, forward stepwise analysis was used to determine the best combination that reflected post-QC NAFLD risk. We determined the area under the curve (AUC) of the receiver operating characteristic (ROC) curve via forward stepwise analysis to confirm the SNP combination's suitability. The AUC of the ROC curve, as calculated using Proc logistic in SAS, was 0.5693. The PHS was calculated as the participant's genotype for eight selected SNPs and parameter estimates (β) from a Cox proportional hazards regression.

Expression Quantitative Trait Loci (eQTL) Analysis of Eight SNPs Using Genotype-Tissue Expression (GTEx)
eQTL analysis was performed using GTEx Projects (release version 8) [52]. We sought to determine whether the multiple SNP-affected tissues were related to NAFLD.

Statistical Analysis
All statistical analyses were performed using SAS software version 9.4 (SAS Institute, Inc., Cary, NC, USA), except for QC and PHS. Continuous and categorical variables are expressed as frequencies (%) and mean values (± standard deviations). For baseline analyses, the Mann-Whitney Wilcoxon test and chi-squared test were used to compare continuous and categorical variables, respectively. Cox Proportional Hazard Regression (Cox regression) was used to assess NAFLD incidence after adjusting for confounding variables. The probability in regression analysis was adjusted for sex, age, BMI, ALT level, physical activity, smoking status, educational level, hypertension, DM, hyperlipidemia, menopause (only for women), alcohol consumption, and total caloric intake according to previous studies [53][54][55][56]. The interaction between the PHS and NAFLD was examined using Cox regression, including the interaction term and the Wald test. The p-values for the trends across protein intake levels (equal to or above/below the RNI and tertiles) were calculated using a multivariable logistic regression model, with protein intake levels as continuous variables. The p-values for the trends between the PHS and NAFLD were determined using the generalized linear model after adjusting for the abovementioned confounding factors. Table 1 shows the participants' characteristics after categorizing them into non-incident (non-NAFLD group) and incident (NAFLD group) NAFLD groups at baseline. Among the 3501 participants, 1521 developed NAFLD within the follow-up period. The mean age was higher in the NAFLD group than in the non-NAFLD group. In terms of physiquerelated and biochemical information, the NAFLD group exhibited higher fasting blood glucose, insulin, AST, ALT, TCh, TG, LDL-C, and NLFS levels than the non-NAFLD group. Nevertheless, it is noteworthy that the mean biochemical-variable values spanned the normal reference-value range in both groups, except for TG, whose mean slightly exceeded the normal range in both groups (not shown). The number of patients with DM and hyperlipidemia, excluding those with hypertension, was higher in the NAFLD group than in the non-NAFLD group. Lifestyle and dietary habits, exercise, and carbohydrate intake (%) were more pronounced in the NAFLD group than in the non-NAFLD group.  (1) 12.9 ± 2.3 12.9 ± 2.4 0.3370 CHO (%) (1) 72.5 ± 6.4 73.1 ± 6.5 0.0024 Fat (%) (1) 13.5 ± 5.1 12.8 ± 5.1 <0.0001

Baseline Characteristics and Nutritional Intake
p-values were calculated by Mann-Whitney Wilcoxon test for continuous variables and chi-squared test for categorical variables. Significance was set at p < 0.05. Abbreviations: BMI, body mass index; HDL-C, high density lipoprotein cholesterol; LDL-C, low density lipoprotein cholesterol; AST, aspartate aminotransferase; ALT, alanine aminotransferase; MET-hr/wk, metabolic equivalent of task-hour/week; CHO, carbohydrate; NLFS, non-alcoholic fatty liver disease liver fat score. (1) Percentages of energy intake from CHO, protein and fat were calculated as follows: CHO intake and protein intake (g/day) × 4 kcal/total energy intake (kcal/day) × 100%, fat intake (g/day) × 9 kcal/total energy intake (kcal/day) × 100%.
The locations of the SNPs relevant to the genes are presented in Figure 2. The AUC value was verified using the ROC curve (AUC = 0.5693). The SNPs' β-adjusted values and risk alleles for NAFLD incidence are shown in Table 2.

Association between the PHS and NAFLD Incidence
We validated the PHS with the rate of NAFLD incidence. Among all participants, those with higher PHS values exhibited significantly higher hazard risks of NAFLD (p-trend < 0.0001); a similar trend was observed in male (p-trend = 0.0171) and female (p-trend < 0.0001) participants, as expected (Table 3).
In women, compared with the first quartile, all other PHS quartiles indicated a significantly higher risk, whereas in men, the third and fourth quartiles, but not the second, exhibited a higher risk. The increased hazard ratio (HR) was relatively greater in women than in men. (1) Adjusted for sex (male or female), age (years, continuous), BMI (continuous), ALT, physical activity, smoking status (never smoker, former smoker, or current smoker), education level (elementary or below, junior high school, high school, and college or above), diabetes mellitus (yes or no), hypertension (yes or no), hyperlipidemia (yes or no), alcohol intake (g/week, continuous) and total calorie intake (kcal/day, continuous). (2) Male was adjusted for (1) except for sex. (3) Female was adjusted for (1) plus menopause except for sex.

Association between the PHS and NAFLD Incidence by Protein Intake
Subsequently, we verified the association between protein intake and NAFLD risk prior to investigating the interaction. Table 4 shows the association between protein intake and the HR for NAFLD when protein intake level was divided into tertiles (low, medium, and high) or into two groups (intake ≥ or <RNI). Protein intake was not significantly associated with NAFLD risk after adjusting for confounding factors. In women, the unadjusted model and model 1 revealed that low protein intake appeared to be associated with the HR for NAFLD; however, the significance of this association disappeared in models 2 and 3.
Furthermore, we sought to determine whether the association between the FGF21related PHS and NAFLD risk varied with protein intake. The results showed that protein intake modified the association in women only (Table 5). Protein intake affected NAFLD risk in a varied manner depending on the PHS quartile (p-interaction = 0.0131 and 0.0361). The data were presented as HR (95% CI). (1) model 1: adjusted for sex, age, BMI, and ALT; model 2: further adjusted for physical activity, smoking status (never smoker, former smoker, or current smoker), education level (elementary or below, junior high school, high school, and college or above), diabetes mellitus, hypertension, hyperlipidemia, and alcohol intake (g/week); and model 3: further adjusted for total calorie intake (kcal/day). (2) Male was adjusted for (1) except for sex. (3) Female was adjusted for (1) plus menopause except for sex. The data were presented as HR (95% CI). (1) Adjusted for sex, age, BMI, ALT, physical activity, smoking status (never smoker, former smoker, or current smoker), education level (elementary or below, junior high school, high school, and college or above), diabetes mellitus, hypertension, hyperlipidemia, alcohol intake (g/week), and total calorie intake (kcal/day). (2) Male was adjusted for (1) except for sex. (3) Female was adjusted for (1) plus menopause except for sex.
In women only, with the lowest PHS values, NAFLD risk was significantly higher in the low-protein-intake group than in the high-protein-intake group (HR = 2.921, p-trend <0.0001). In contrast, women with the highest PHS values exhibited a marginally elevated risk in the medium-protein-intake group (HR = 1.435, p-trend = 0.0203). When categorizing protein intake based on the RNI, a differential effect of protein intake level was more pronounced. Women with the lowest PHS values had a high risk of NAFLD when consuming a protein level lower than the RNI (HR = 2.021, p-trend = 0.0016) compared with those with an intake ≥ the RNI, whereas those with higher PHS values exhibited a high risk, regardless of protein intake level. The results revealed the contribution of FGF21-related genetic variants and restricted protein intake to NAFLD incidence.

Potential Effects of Genetic Variants on Gene Expression
eQTL analysis was performed to determine whether the selected SNPs potentially affected gene expression in various tissues. eQTL information for only three of the eight SNPs (FGFR1 rs881301, FGFR2 rs9420328, and FGFR2 rs2420941) were available to demonstrate whether an SNP influences the expression level of one's corresponding gene in GTEx ( Figure 3). Intriguingly, the C allele of FGFR1 rs881301 (an NAFLD risk allele, shown in Table 2) was shown to significantly increase FGFR1 expression in various tissues, with the highest significance occurring in whole blood (p value = 1.92 × 10 −41 ) and the lowest in musculoskeletal and brain hypothalamus tissues (p values = 3.2 × 10 −5 and 2.01 × 10 −3 , respectively). However, the NAFLD risk alleles FGFR2 rs9410328 and FGFR2 rs242041 (C and T alleles, respectively) did not significantly affect the expression of their corresponding genes in musculoskeletal and brain hypothalamus tissues, except for a marginal effect of rs9410328 in musculoskeletal tissue. The results indicate that some of the SNPs, such as FGFR1 rs881301, but not all, may be functionally linked to NAFLD risk via gene-expression alteration.

Discussion
In this study, we developed a FGF21-related PHS to explore the genetic contribution of the FGF21 pathway to NAFLD incidence and sought to ascertain whether the association between the PHS and NAFLD risk is modulated by dietary protein intake level. A few previous studies have demonstrated the genetic contribution of FGF21 pathways to the metabolic condition. For example, Kaess et al. investigated the association of 63 common SNPs in 5 loci involved in the pathway with metabolic phenotypes, including LDL-C, HDL-C, TG, and BMI, and found FGFR2 polymorphism (rs2071616) to be associated with LDL-C in the European population, a phenomenon that was validated by two other European cohorts [57]. Ji et al. reported that SNPs in the KLB gene were correlated with BMI (rs7670903) and hepatic inflammation (rs7674434 and rs12152703) in the Han Chinese population [58]. Although these results suggest that genetic variants linked to the FGF21 pathways are potentially involved in NAFLD pathogenesis, the evidence was based on cross-sectional studies, and the effect size of individual SNPs was limited. In our study, the PHS was developed by selecting 8 out of 226 SNPs at the FGF21 gene and its receptor genes.
The association between PHS and NAFLD risk was confirmed by showing a positive association with NAFLD incidence in both gender (p-trend = 0.0171 and <0.0001 in males and females, respectively, Table 3). The combination of SNPs in the PHS might be related to elevation of the FGF21 pathway. Although how the SNPs impact the risk of NAFLD needs to be further elucidated, GTEx analysis indicated that at least some of the SNPs such as rs881301 at the FGFR1 locus were significantly associated with upregulation of the corresponding gene in various tissues (Figure 3). Since FGFR1 expression was reported to be positively correlated with FGF21 expression [25,26], the eQTL result was in line with the fact that serum level as well as hepatic expression level of FGF21 were positively associated with the intrahepatic steatosis grade and hepatic triglyceride levels, respectively [25][26][27].
Interestingly, inadequate protein intake (<RNI) compared with adequate intake (≥RNI) in women significantly increased NAFLD risk in participants with the lowest PHS values (HR 2.021) but not in those with the highest PHS values (p-interaction = 0.0361, Table 5). These results imply that inadequate protein intake may contribute to NAFLD incidence in people with low genetic risk potentially via FGF21-pathway induction, while those with high genetic risk already have relatively elevated FGF21-pathway activity, regardless of protein intake level. FGF21-pathway stimulation upon protein restriction has been well documented in both human and animal models [34][35][36], exhibiting the upregulation of not only FGF21 but also its receptors. In addition, several GWAS analyses have revealed that gene variants in FGF21 and its receptors are related to diet composition [59][60][61]. FGF21gene variants such as rs838133 and rs838145 are associated with high carbohydrate intake and low protein or fat intake, respectively [59,60]. The results indicate a potential link between the FGF21 pathway and dietary macronutrient distribution. In addition, fructose consumption, which was not accessed in this study, might be a candidate modulator of the association between the FGF21 pathway and the risk of NAFLD. Dietary factors, including fructose consumption, have been extensively studied for their contribution to the risk of NAFLD [62]. Intriguingly, a recent study has indicated that fructose ingestion can stimulate the level of circulating FGF21 [63]. To gain a deeper understanding of this relationship, further studies are required.
The protein-intake-modified association between the PHS and NAFLD risk was more evidently observed in women only. However, the reason for the significant interaction in women remains unclear. It could be due to the sex-differential expression of receptors and response to FGF21. A recent animal study thoroughly investigated the effects of sex and genetic background on metabolic, physiologic, and molecular responses to protein restriction [38]. In fact, FGF21's response to a low-protein diet was sexually dimorphic. Female mice exhibited a significant gain in fat mass in the low-protein group but no differences in body weight and lean mass. In contrast, male mice displayed dramatic loss of body weight and lean mass but no change in fat mass. FGF21 could be responsible for these metabolic changes. Hormonal changes in women related to conditions such as polycystic ovary syn-drome (PCOS) are another potential factor in the development of NAFLD. Recent research, including a meta-analysis of 15 studies, has shown a strong association between PCOS and the risk of NAFLD, independent of BMI [64]. This association has also been confirmed in a study involving Korean women [65]. Since our dataset did not have information about hormonal changes in the participants, further studies are needed to address and minimize the potential bias that might have contribute to the observed difference in women.
This study has certain limitations. First, it was conducted with a limited number of participants from one ethnic population. Genetic association studies are susceptible to population stratification where differences in allele frequency between cases and controls emanate from systematic differences in ancestry. This study's findings require validation using a larger, independent cohort involving other ethnic populations. In addition, the analysis was limited to a relatively old collection of data. Validation of the findings on recent data will be beneficial. Second, we ascertained participants' NAFLD statuses using the NFLS, which is a predictive equation for diagnosing NAFLD [48]. Although ultrasound and biopsy are the gold standards for NAFLD diagnosis, the NFLS possesses high sensitivity and specificity, and this was confirmed in the Korean population [66]. Third, PHS development using multiple SNPs in a specific pathway is a promising approach for predicting the risk of complex diseases. It is based on common SNPs in the genes related to the FGF21 pathway accessible from the Affymetrix Genome-wide SNP Array 5.0. However, we cannot exclude the possibility of additional SNPs that were not available on the array but potentially contributed to NAFLD risk. Fourth, we lacked details regarding prescribed drugs that may affect liver health. Finally, we cannot rule out unmeasured or residual confounding variables.
Notwithstanding, to the best of our knowledge, this is the first study to investigate the genetic contribution of the FGF21 pathway to NAFLD risk using the PHS and establish its modification by dietary intake in Korean adults. In conclusion, in women only, genetic variants in the genes encoding FGF21 and its receptors were collectively associated with NAFLD risk. Moreover, protein intake less than the RNI increased NAFLD risk in the participants with the lowest PHS values; however, it did not affect the NAFLD incidence rate in those with higher PHS values. Further investigation is required to validate these findings.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/nu15102385/s1, Table S1: Information of SNPs in FGF21, FGFR1, FGFR2 and KLB genes.  Informed Consent Statement: Written informed consent has been obtained from the patients to publish this paper.

Data Availability Statement:
The KoGES data are available upon request from the National Research Institute of Health [37]. The GTEx dataset is available from the NIH dbGAP, study number phs000424.v8.p2.