Development and Internal Validation of Fatty Liver Prediction Models in Obese Children and Adolescents

To develop predictive models of fatty liver (FL), we performed a cross-sectional retrospective study of 1672 obese children with a median (interquartile range) age of 15 (13–16) years. The outcome variable was FL diagnosed by ultrasonography. The potential predictors were: (1) binary: sex; (2) continuous: age, body mass index (BMI), waist circumference (WC), alanine transaminase (ALT), aspartate transaminase, gamma-glutamyltransferase, glucose, insulin, homeostasis model assessment of insulin resistance (HOMA-IR), HDL-cholesterol, LDL-cholesterol, triglycerides, mean arterial pressure, uric acid, and c-reactive protein; (3) ordinal: Pubertal status. Bootstrapped multivariable logistic regression with fractional polynomials was used to develop the models. Two models were developed and internally validated, one using BMI and the other using WC as the anthropometric predictor. Both models included ALT, HOMA-IR, triglycerides, and uric acid as predictors, had similar discrimination (c-statistic = 0.81), and were similarly well calibrated as determined by calibration plots. These models should undergo external validation before being employed in clinical or research practice.


Introduction
The prevalence of fatty liver (FL) is increasing worldwide mostly because of the current epidemics of obesity [1,2]. A recent metanalysis performed in subjects aged 1 to 19 years reported a prevalence of FL of 2.3% in normal-weight, 12.5% in overweight, and 36.1% in obese children [2]. Even if the natural history of FL in children is largely unknown and cause-effect relationships between FL and associated diseases are hard to prove with the present evidence base [3], it is reasonable to assume that FL persisting from childhood to adulthood may have harmful hepatic and extrahepatic consequences [4][5][6][7].
FL is operationally defined as visible steatosis in more than 5% of hepatocytes at liver biopsy or as an intrahepatic triglyceride content of at least 5.6% at magnetic resonance spectroscopy (MRS) or magnetic resonance imaging (MRI) [8]. However, the most common method used to diagnose FL in clinical practice and epidemiological research is liver ultrasonography (LUS) [8,9]. When LUS is not available, the presently suggested method to diagnose FL in adults is the calculation of surrogate indices of FL [8].
The use of surrogate indices of FL is becoming increasingly popular in adults [10]. However, very few data are available to date on surrogate indices of FL in children and adolescents. In a cross-sectional study, performed at our Center in 2007 on 267 subjects aged 8 to 18 years and with a body mass index (BMI) > 90th percentile for age, we found that BMI, alanine transaminase (ALT), uric acid, insulin during oral glucose tolerance testing (OGTT), and glucose during OGTT, contributed independently to FL [11]. Although these variables were potential contributors to a multivariable surrogate index of FL, we did not develop a prediction model, mostly because of the low number of available children [11]. Surprisingly, only one study was performed to date with the aim of developing a prediction model for FL in children and adolescents. In 2011, Hosseini et al. developed a prediction model of FL based on sex, age, BMI, waist circumference (WC) and triglycerides in a sample of 962 Iranian subjects aged 6 to 18 years [12]. Unfortunately, Hosseini et al. [12] had not available measurements of liver enzymes, uric acid, insulin, and c-reactive protein (CRP), which are potential predictors of FL [11,13,14].
Insulin was the strongest multivariable predictor of FL in the general adult population of the Dionysos Nutrition and Liver Study [15,16]. However, when we used the data from the Dionysos Nutrition and Liver Study to develop the fatty liver index (FLI), we did not take insulin into account because its measurement was not routinely performed [16]. Nowadays, insulin is almost routinely measured in obese patients and in those with FL, and it will be increasingly so if the newly proposed definition of metabolic (dysfunction) associated fatty liver disease (MAFLD) will replace the more traditional separation of FL into non-alcoholic fatty liver disease (NAFLD) and alcoholic fatty liver disease (AFLD) [13]. To diagnose MAFLD, one has in fact to calculate the homeostasis model assessment of insulin resistance (HOMA-IR), which is obtained from fasting insulin and glucose [17]. The diagnosis of MAFLD requires also the measurement of CRP [13], which may therefore become more common in patients with FL.
The aim of the present cross-sectional study was, therefore, to develop and internally validate multivariable prediction models of FL for obese children taking into account anthropometry, the components of the metabolic syndrome, liver enzymes, insulin and glucose alone and combined into HOMA-IR, uric acid, and CRP. The study was reported following the "Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis" (TRIPOD) guidelines (Appendix A).

Source of Data
The development and internal validation of the multivariable models for the prediction of FL were performed using an already available dataset of obese children followed at our tertiary care center for the treatment of pediatric obesity [18]. The study was approved by the Ethical Committee of the Istituto Auxologico Italiano (research project code 1C021_2020, acronym BILOB) and was conducted in accordance with the Declaration of Helsinki. Written informed consent to participate in the study was obtained from the subjects aged 18 years or from the legal representatives of those aged < 18 years.

Participants
All the children were admitted to the Clinic to undergo a short-term structured multidisciplinary weight-loss program and underwent the assessment of the outcome and the predictors before starting such program. The inclusion criteria were: (1) Caucasian ethnic group, (2) age ≤ 18 years, (3) BMI ≥ 95th percentile for age according to Italian growth charts [19], and (4) availability of LUS. The exclusion criteria were: (1) Genetic or syndromic obesity, (2) treatment with any drug, (3) alcohol intake (any quantity), and (4) hepatitis B virus (HBV) or hepatitis C virus (HCV) infection. All children underwent a detailed clinical history and physical examination. Alcohol intake was assessed by interview. Pubertal status was classified by a pediatric endocrinologist in 5 stages according to Tanner [20]. Weight and height were measured following international guidelines [21]. WC was measured at the midpoint between the last rib and the iliac crest. BMI was calculated as weight (kg)/height (m) 2 . Standard deviation scores (SDS) of weight, height and BMI were calculated using Italian growth charts [19]. Performed blood tests included: (1) Alanine transaminase (ALT), (2) aspartate transaminase (AST), (3) gamma-glutamyltransferase (GGT), (4) glucose, (5) insulin, (6) cholesterol, (7) HDL-cholesterol, (8) LDL-cholesterol, (9) triglycerides, and (10) CRP. All blood tests were performed in the fasting state and evaluated using a Cobas 6000 analyzer (Roche Diagnostics, Monza, Italy). Systolic (SBP) and diastolic blood pressure (DBP) was measured using a sphygmomanometer following international guidelines. Mean arterial pressure (MAP) was calculated as SBP*(1/3) + DBP*(2/3). The metabolic syndrome was diagnosed using the criteria of the International Diabetes Federation [22]. HOMA-IR was calculated as (insulin*glucose)/405 [17].

Outcome
The outcome was FL diagnosed by LUS [11,23]: (1) Normal liver was defined as the absence of liver steatosis or other liver abnormalities; (2) mild FL as the presence of slight bright liver or hepatorenal echo contrast without intrahepatic vessels blurring and no deep attenuation; (3) moderate FL as the presence of mild bright liver or hepatorenal echo contrast without intrahepatic vessel blurring and with deep attenuation; (4) severe FL as diffusely severe bright liver or hepatorenal echo contrast, with intrahepatic vessels blurring (no visible borders) and deep attenuation without visibility of the diaphragm. For the present analysis, FL was coded as 0 = normal liver and 1 = any degree of FL. Because of the exclusion of any degree of alcohol intake and of HBV and HCV infections (see Section 2.2), FL as determined by the present study is synonym with NAFLD. One should note, however, that our threshold for alcohol intake (0 g/day) is much lower than that currently used to define NAFLD (<30 g/day in men and <20 g/day in women) [8].

Predictors
The following potential predictors were considered: (1) binary: Sex (0 = female; 1 = male); (2) continuous: age (years), BMI (kg/m 2 ), WC (cm), ALT (U/L), AST (U/L), GGT (U/L), glucose (mg/dL), insulin (µU/mL), HOMA-IR (dimensionless), HDL-cholesterol (mg/dL), LDL-cholesterol (mg/dL), triglycerides (mg/dL), MAP (mm Hg), uric acid (mg/dL), and CRP (mg/l); (3) ordinal: Pubertal status (5 Tanner stages). The predictors included all the continuous components of the metabolic syndrome, i.e., WC, glucose, HDLcholesterol, triglycerides, and blood pressure [22]. MAP was used as measurement of blood pressure to avoid problems of multicollinearity stemming from having SBP and DBP in the same model [24]. The predictors included liver enzymes (ALT, AST, GGT), which have been associated with FL in children [11]. As discussed in detail in the Introduction, insulin was included among the predictors because it was the most effective multivariable predictor of FL in the Dionysos Nutrition and Liver study [16], and because its measurement is required to calculate HOMA-IR, which is needed to diagnose MAFLD [13]; CRP was included because it is similarly required to diagnose MAFLD [13]; lastly, uric acid was included because it has been reported as a promising predictor of FL [14].

Sample Size
This is a cross-sectional retrospective study performed on an already available dataset of 1672 children and adolescents (see Section 2.1). The present sample size calculation was therefore post-hoc. For the reasons discussed at Section 2.4, we were interested to evaluate the multivariable association of fatty liver with the following predictors: (1) sex, (2) age,  [25,26], we calculated that 1535 children were needed to estimate a Cox-Snell R 2 of 0.11 with a prevalence of 38% of the outcome, under the expectation of (1) a shrinkage of predictor effects < 10%, (2) a difference of 5% in the model apparent and adjusted Nagelkerke R 2 value and, (3) an estimation within 5% of the average outcome risk in the population. Following the suggestion of Riley et al. [25,26], we choose the lowest vale of Cox-Snell R 2 that we detected in our experience at developing multivariable models for the prediction of fatty liver. We also ignored likely multicollinear terms, e.g., age and pubertal status [18], that we expected to remove in the model development phase [24,27], to err on the side of including more subjects for the prediction, as we already had a large dataset at our disposal. The available sample size of 1672 children was thus more than enough to develop the models of interest.

Missing Data
There were no missing data.

Statistical Analysis
Most continuous variables were not Gaussian-distributed, and all are reported as median (50th percentile) and interquartile range (25th and 75th percentiles). Discrete variables are reported as the number and proportion of subjects with the characteristic of interest. As suggested by Royston and Sauerbrei [27,28], we examined the Spearman correlation matrix between the outcome and the potential predictors before embarking into multivariable modeling and kept only one of the variables of highly correlated clusters, as defined by a Spearman's rho > |0.60|, they key criterion being its clinical availability. The importance and the functional form of the predictors was evaluated using degree 2 fractional polynomials coupled to logistic regression in 1000 bootstrap samples with replacement [27][28][29]. Only predictors with a bootstrap inclusion factor (BIF) ≥ 66%, i.e., occurring in more than two thirds of bootstrap samples, were included in the final models. The 95% confidence intervals (95%CI) of the predictors of the final models were calculated using bootstrap in 1000 samples with replacement. Discrimination was evaluated using Harrell's c-statistic, which is equivalent to the area under the receiver operating characteristic curve [30]. As measures of model fit, besides Cox-Snell R 2 and Nagelkerke R 2 , we calculated the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) [23]. Internal calibration was evaluated using Van Calster's 3-level hierarchy [31,32]: (1) "Mean calibration", which compares the observed event rate with the average predicted risk; (2) "weak calibration", which tests whether the calibration slope is 1 and the calibration intercept is 0; (3) "moderate calibration", which uses a calibration plot with a superimposed locally weighted scatterplot smoother to test whether the predicted risks correspond to the observed event rates [33]. The 95%CI of the calibration slope and intercept were calculated using bootstrap on 2000 samples with replacement [31]. Statistical analysis was performed using Stata 16.1 (Stata Corporation, College Station, TX, USA) with the pmsampsize module [34], and R 4.0.4 (R Core Team 2021, Vienna, Austria) with the val.prob.ci.2 function [31]. R code was run from within Stata using the rcall package [35]. Table 1 gives the measurements of the 1672 study subjects. They had a median (IQR) age of 15 (13-16) years, ranging from 5 to 18 years, and most of them (748, 45%) were postpubertal. FL was detected in 642/1672 (38%, 95%CI 36 to 41%) of the study subjects. Continuous variables are reported as median (50th percentile) and interquartile range (25th and 75th percentiles). Discrete variables are reported as number and proportion. Abbreviations: SDS = standard deviation scores [19]; BMI = body mass index; IDF = International Diabetes Federation [22]; ALT = alanine transaminase; AST = aspartate transaminase; GGT = gamma-glutamyltransferase; HOMA-IR = homeostasis model assessment of insulin resistance; HDL = high density-lipoprotein; LDL = low density-lipoprotein, CRP = c-reactive protein.

Selection of Predictors for Multivariable Modeling
As suggested by Royston and Sauerbrei [27,28], we examined the Spearman correlation matrix between the outcome and the potential predictors before embarking into multivariable modeling (Table 2). Only one of the variables pertaining to highly correlated clusters was kept, as detected by a Spearman's rho > |0.60|. Age and pubertal status were highly correlated (rho = 0.82). Age was kept for multivariable modeling because the assessment of pubertal status requires a pediatric endocrinologist, rendering a prediction model including pubertal status not usable in standard clinical practice. BMI and WC were highly correlated (rho = 0.77). Because we aimed at detecting whether the single components of the metabolic syndrome contribute to FL and WC is a component of the metabolic syndrome, we decided to develop two distinct multivariable models, one using BMI and the other using WC as the anthropometric predictor. It should be noted that, in the present study, we used BMI (kg/m 2 ) as predictor instead of BMI (SDS), which was used in our previous study [11]. This was done because "raw" BMI is not dependent on any given reference chart and allows a more direct comparison with WC, which was not evaluated in our previous study [11]. ALT was highly correlated with AST (rho = 0.81) and GGT (rho = 0.61). ALT was kept for multivariable modeling because it is more hepatospecific than AST, and because GGT is as a second-level exam in most centers. Lastly, HOMA-IR and insulin were correlated so strongly (rho = 0.98) to be considered synonyms for modeling purposes. We kept HOMA-IR for multivariable modeling instead of insulin because of its pathophysiological significance and because it is presently required for the diagnosis of MAFLD [13]. Glucose was not entered into the multivariable models because it is already included into HOMA-IR (See Section 2.2).

Multivariable Modeling Strategy
Based on the above findings, we decided to evaluate two distinct multivariable models, one using BMI and the other using WC as the anthropometric predictor. Both models included sex, age, ALT, HOMA-IR, HDL-cholesterol, LDL-cholesterol, triglycerides, MAP, uric acid, and CRP as potential predictors. The importance and the functional form of each predictor was evaluated using degree 2 fractional polynomials in 1000 bootstrap samples with replacement [27][28][29]. The bootstrap inclusion fraction (BIF) and the functional form of the fractional terms 1 (BIF-1) and 2 (BIF-2) of the polynomials of the predictors are given in Table 3. Only predictors with BIF-1 ≥ 66%, i.e., selected in at least two thirds of bootstrap samples, were considered for inclusion in the final multivariable models. As the BMI-based multivariable model is concerned, ALT had the highest BIF-1 (100%), followed by BMI (98%), age (95%), HOMA-IR (95%), uric acid (79%) and triglycerides (74%). All variables were selected as linear except for ALT and HOMA-IR. A second fractional polynomial term was needed for ALT (BIF-2 = 90%), including ALT −2 and ALT −1 , and for HOMA-IR (BIF-2 = 51%), including HOMA-IR and HOMA-IR 2 .

Multivariable Models
The final multivariable models are given in Table 4. The corresponding regression equations are given in Appendix B.
The discrimination of the BMI model was good (c-statistic = 0.81, 95%CI 0.79 to 0.83). Figure 1 gives the calibration plot of the same model. The average expected rate of fatty liver (38%, 95%CI 36% to 40%) equaled the average observed rate (38%, 95%CI 36% to 41%), showing a satisfactory mean calibration. At logistic calibration, the average calibration slope was 1 and the average intercept was 0, showing a satisfactory weak calibration. Lastly, the examination of calibration plots showed an acceptable profile of moderate calibration (Figure 1).  The discrimination of the WC Model was good (c-statistic = 0.81, 95%CI 0.78 to 0.83). Figure 2 gives the calibration plot of the same model. The average expected rate of fatty liver (38%, 95%CI 36% to 40%) equaled the average observed rate (38%, 95%CI 36% to 41%), showing a satisfactory mean calibration. At logistic calibration, the average calibration slope was 1 and the average intercept was 0, showing a satisfactory weak calibration. Lastly, the examination of calibration plots showed an acceptable profile of moderate calibration (Figure 2).

Discussion
The aim of the present study was to develop and internally validate multivariable models for the prediction of FL in obese children and adolescents. Using a regression modeling strategy based on the bootstrap [27,28], we developed and internally validated two prediction models of FL, one using BMI and the other using WC as the anthropometric predictor. Both models showed good discrimination and internal calibration (Table 4 and Figures 1 and 2).
Our prediction models are obviously not intended to replace LUS. Provided that they are externally validated, these models could be used as surrogate indices of FL in obese children and adolescents in contexts where LUS is not available, as is currently suggested for adults [8]. Moreover, always after external validation, these models could be used to develop algorithms to rule in or rule in or out FL at certain values of the model score (Appendix B) [16]. The good internal calibration of these indices does not imply, of course, a similarly good external calibration [10].
Although this is the largest study performed so far in obese children and adolescents with the aim of developing multivariable models for the prediction of FL, it is not without limitations. The first limitation is that our models were developed on severely obese and adolescents and may therefore not apply to non-obese subjects. The median SDS of BMI of our cross-section of children and adolescents is in fact greater than the 99th percentile of the reference distribution [19]. Although all the predictors identified by our regression modeling strategy are in keeping with our current pathophysiological and clinical knowledge about FL [10,11,16,23,36], there is no reason to believe that the underlying prediction models will perform equally well in non-obese children and adolescents. Besides needing external validation on obese children and adolescents to assess their true performance [33,37], our prediction models will have to undergo separate evaluation in non-obese children if one plans to use them in non-obese subjects. The second limitation is that we studied only Caucasian children, the reason being that non-Caucasian individuals with obesity account for less than 2% of the patients currently followed at our Center [38]. The third limitation is that LUS, our diagnostic method, is known to offer an accurate assessment of FL only starting from an intrahepatic triglyceride content of at least 10% [39], implying a number of "false negatives", i.e., missed cases of FL, as compared to MRI or MRS. However, LUS is presently the only feasible option to perform large studies and most of the surrogate indexes of FL currently employed in adults were developed using LUS as the reference method [40].
In addition to BMI or WC as the anthropometric predictor, the two multivariable models that we developed and internally validated had the same set of predictors: ALT, HOMA-IR, triglycerides and uric acid.
BMI and WC were entered as predictors of separate regression models to avoid multicollinearity ( Table 2). Multicollinearity was not an issue when we developed FLI, which includes both BMI and WC, in the general population of Campogalliano (Modena, Italy) [16]. However, multicollinearity was present when we modelled the association between FL and potential risk factors in the general population of Bagnacavallo (Ravenna, Italy) [23]. Although this is not specified, multicollinearity between BMI and WC was likely not an issue for the development of the only prediction model of FL available to date for children, which in fact includes sex, age, BMI, and WC as predictors [12]. Quite interestingly, when BMI was replaced by WC (Table 4), the changes in the regression coefficients of the other predictors were small, reinforcing the idea that, in our cross-section of obese children, BMI and WC are interchangeable measures as the prediction of FL is concerned ( Table 2). Potential advantages of BMI over WC are that: (1) its components, i.e., weight and height, are routinely measured in children of all ages; (2) its measurement is more precise than that of WC in severely obese subjects; and (3) it was recently shown to be associated with incidence and remission of NAFLD in children [41].
ALT was the only predictor chosen in all bootstrap samples in both the BMI and WC models (Table 3), confirming our previous finding of a strong multivariable association of FL with ALT [11]. ALT was highly correlated with AST and GGT (Table 2) and, to avoid multicollinearity, we choose to keep it in the models because it is more hepatospecific than AST, and, contrary to GGT, it is routinely measured as "first-level" liver enzyme. As we discussed in detail elsewhere [11], the fact that ALT is confirmed to be a component of a multivariable predictor of FL does not imply that it can be used alone to discriminate children with from those without FL.
HOMA-IR was not an independent predictor of FL in our previous study of 278 children aged 8 to 18 years with BMI > 90th percentile for age and the same was true for its components, i.e., fasting glucose and insulin, as evaluated by distinct logistic regression models [11]. However, in that study both the area under the curve of glucose and that of insulin during OGTT were independent predictors of FL [11]. Besides the larger sample size (N = 1672), the wider age (5 to 18 years) of the subjects, and the higher entry criterion for BMI (≥ 95th percentile for age) of the present study, it is possible that the more sophisticated multivariable selection of predictors used for the present analysis has allowed HOMA-IR to show its full potential as predictor of FL [28].
Interestingly, of the components of the metabolic syndrome besides WC, only triglycerides were independent predictors of FL in both the BMI and WC models, confirming our previous findings [11], those of Hosseini et al. [12], and what is more generally known about the hypertriglyceridemic waist phenotype [36].
Uric acid was identified as an independent predictor of FL in both the BMI and WC models, confirming our earlier findings [11] and the increasing evidence linking uric acid levels with NAFLD [4,14,42].
The BIF-1 of CRP was under the prespecified threshold of 66% to accept it for inclusion in both the BMI and WC models, and this was especially evident for the BMI model (Table 3). This finding is in agreement with our previous study, where CRP was not an independent predictor with FL [11].

Conclusions
In conclusion, in a large sample of obese children and adolescents, FL can be accurately diagnosed by using multivariable models based on BMI or WC, ALT, HOMA-IR, triglycerides, and uric acid. These models should undergo external validation, consisting of both discrimination and calibration [10], before being employed in clinical or research practice.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki and was approved by the Ethical Committee of the Istituto Auxologico Italiano (research project code: 1C021_2020, acronym: BILOB).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on reasonable request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest. Report any actions to blind assessment of the outcome to be predicted. (The prediction models were developed retrospectively).

Predictors 7a
Clearly define all predictors used in developing or validating the multivariable prediction model, including how and when they were measured.

P3 7b
Report any actions to blind assessment of predictors for the outcome and other predictors.
(The prediction models were developed retrospectively).