Development of a Computational Model to Predict Excess Body Fat in Adolescents through Low Cost Variables

Background: Excess body fat has been growing alarmingly among adolescents, especially in low income and middle income countries where access to health services is scarce. Currently, the main method for assessing overweight in adolescents is the body mass index, but its use is criticized for its low sensitivity and high specificity, which may lead to a late diagnosis of comorbidities associated with excess body fat, such as cardiovascular diseases. Thus, the aim of this study was to develop a computational model using linear regression to predict obesity in adolescents and compare it with commonly used anthropometric methods. To improve the performance of our model, we estimated the percentage of fat and then classified the nutritional status of these adolescents. Methods: The model was developed using easily measurable socio-demographic and clinical variables from a database of 772 adolescents of both genders, aged 10–19 years. The predictive performance was evaluated by the following metrics: accuracy, sensitivity, specificity, and area under ROC curve. The performance of the method was compared to the anthropometric parameters: body mass index and waist-to-height ratio. Results: Our model showed a high correlation (R = 0.80) with the body fat percentage value obtained through bioimpedance. In addition, regarding discrimination, our model obtained better results compared to BMI and WHtR: AUROC = 0.80, 0.64, and 0.55, respectively. It also presented a high sensitivity of 92% and low false negative rate (6%), while BMI and WHtR showed low sensitivity (27% and 9.9%) and a high false negative rate (65% and 53%), respectively. Conclusions: The computational model of this study obtained a better performance in the evaluation of excess body fat in adolescents, compared to the usual anthropometric indicators presenting itself as a low cost alternative for screening obesity in adolescents living in Brazilian regions where financial resources are scarce.


Introduction
The prevalence of excess body fat has been growing alarmingly worldwide, this increase is also observed among teenagers [1,2]. In certain developed, the prevalence of obesity among adolescents has achieved high levels [3,4], as in the USA, that 17 % of teenagers are considered obese [4]. Statistics are also alarming in developing countries like Brazil, since the rate of obesity also grows rapidly, that nearly 20% of teenagers are obese [1]. Consequently, the assessment of nutritional status plays a relevant role in the fluctuations in the body composition of the individual, as well as in the survival rate under pathological conditions, since it allows the early diagnosis of comorbidities associated with overweight such as cardiovascular disease and endocrine disorders [5,6].
There are several techniques for assessing body composition, among which bioimpedance (BIA) stands out, which is a fast, portable, and non-invasive method that uses the principle of electrical impedance. Moreover, it has little technical difficulty and high correlation with dual energy X-ray absorptiometry (DXA), and its use is indicated in epidemiological studies and clinical practice [7][8][9][10][11]. However, the use of anthropometric parameters is still more viable due to their low cost when compared to BIA.
The most commonly used anthropometric indicator in the assessment of adolescent nutritional status is the body mass index (BMI) due to its low cost, simplicity, and high reproducibility. However, BMI has controversies regarding its efficiency, since it has low sensitivity in predicting excess body fat [12][13][14][15][16]. Individuals who have a high percentage of lean mass will have their total weight affected, and on BMI assessment, this individual will be mistakenly classified as obese.
Thus the identification of individuals with high body fat percentage subclinical is an important measure to identify individuals who need earlier interventions and especially for adolescents. Early detection of excess body fat through low cost and high sensitivity methods allows the implementation of appropriate therapeutic approaches to mitigate the development of comorbidities associated with excess body fat [17]. Therefore, we propose a statistical method to predict obesity in adolescents, using low cost and easily verifiable variables (such as gender, age, height, and body mass).

Construction of The Database
The database is composed of 722 adolescents. The study included adolescents of both genders, aged 10 to 19 years old, duly enrolled in public schools in São Luís, MA. Exclusion criteria were: pregnancy, lactation, use of birth control pills, being on menstrual period, eating disorders, malnutrition, dehydration, body water retention, refusal to participate in the study, and absenteeism in collections. A single researcher performed each measurement with the same calibrated instrument. Based on the protocol by Lohman [18], the measurements were performed in duplicate, in a single moment (transversal study), and for data analysis, the mean values of the collected measurements were calculated. The variables evaluated were socio-demographic (age and gender) and anthropometric (body mass, height, body circumferences, and body fat percentage). The present study is approved by the Ethics Committee in Research with Human Beings of the Federal University of Maranhão according to legal opinion CAAE: 83206118.1.0000.5087.

Sample Calculation
The sample size was calculated by proportion estimation [19], based on the prevalence of overweight in adolescents of 20.5% [20], which suggested a prevalence of outcome of 26.9% [21], tolerable error of 5% (type I error), and test power of 90% (Type II error), with 10 % added for possible losses or refusals. A minimum sample of 513 teenagers was reached.

Data Collect
Body mass was measured with a calibrated electronic scale (Omron ® HBF 214 LA, Japan) with a precision of 0.1 kg. Height was determined through a portable vertical stadiometer with precision of 0.1 cm (Sanny ® , Brazil). For the measurement of circumferences, an anthropometric tape measure with 0.1 cm precision was used (Seca ® 213, Hamburg, Germany). Waist circumference was measured at the midpoint between the last rib and the iliac crest at minimum respiration [22]. Hip circumference was measured at the widest point as described by Huang et al. [23]. Arm circumference was evaluated at midpoint between olecranon and acromial process on the upper left-arm with the subject in standing position [24], and the calf was measured with the participant seated, knee bent at a ninety-degree angle to the floor and this was considered the point of largest calf circumference [25].
The BMI was calculated according to the index of weight (kg)/height 2 (m), and the cut-off point used for classification of nutritional status was the one recommended by the World Health Organization for gender and age, where >3 percentile and <85 percentile was classified as eutrophic and >85 percentile was overweight [26,27]. The waist-to-height ratio (WHtR) was calculated using the index of waist circumference (cm)/height (cm). For WHtR evaluation, the cutoff point used was 0.5 for both genders and age [28]. Age was calculated as the difference between the date of birth and the date of measurement, and the gender and ethnicity was self-declared by the participant.
The percentage of body fat was obtained through the (calibrated) tetrapolar bioimpedance method (Maltron 906BF ® , England). All procedures established were followed. It should be noted that the participants were fasting 2 hours before the evaluation, did not drink alcohol or perform vigorous exercises in the 24 hours prior to the exam, and urinated at least 30 minutes before the test. Measurements were taken with the individual in supine position and without any metallic objects on their body [29][30][31]. For the classification of the adolescents' nutritional status through the body fat percentage, the following cutoff points were used [32,33]: For males, 10.1-20% was considered normal body fat percentage (BFP) and ≥20.1% as high BFP; and for females, 15.1-25% was considered as normal BFP and ≥25.1 as high BFP.
For BFP measurement through DXA, the GE Healthcare Lunar Prodigy device was used, and the scans were analyzed using software version 14.10 (GE Healthcare). This analysis was performed in a subgroup of 12 adolescents also from the public school system of São Luís, MA, in order to show the correlation between the value of BFP obtained by bioimpedance and by DXA.

Predictor Variables
All the characteristics used as input from the model ( Table 1) were chosen based on low cost indicators and easy application. They also had to be described in the literature for the assessment of nutritional status and health of adolescents. The variables-age, body mass, height, and gender-are indexes recommended by the World Health Organization and have been measured in several studies for the nutritional analysis of this population [1,27,28,[34][35][36]. Those variables are also used as a criterion for the evaluation and classification of BMI and waist circumference (for example, References [17,22,37]). In contrast to the anthropometric indicators, waist circumference is already a consolidated index for the analysis of central fat and cardiovascular risk [37][38][39]. In addition, the circumference of the arm, hip, and calf are used to assess the nutritional status and population health [22,23,25,[40][41][42][43][44][45][46]. Other factors were considered, such as low cost, reproducibility, and accessibility to the entry attributes, especially if the method is used in remote places.

Statistical Model
In many areas of knowledge, such as engineering and health, many problems involve investigating the relationship between two or more variables [47][48][49][50]. Multiple linear regression (MLR) is a statistical technique widely used in the literature to verify the relationship between a dependent variable and several independent variables [51]. Therefore, to build the computational model to predict body fat percentage, the concept of multiple linear regression was applied, and MATLAB ® was used to build the model.
The MLR is based on least squares [51], which minimizes the error between the actual results (BFP obtained by the BIA) of the model and the expected results of the training set. The multiple linear regression aims to find an estimate of the real output by means of an equation, according to Equation (1): where y represents the dependent variable, x i the independent variables, β i indicates the regression coefficients, and ε is the error term. Our model has eight predictive (independent) variables described in Table 1, and the model output (dependent variable) will be the estimated body fat percentage value. After obtaining the BFP value through the model, the participant's nutritional status was assessed using the Lohman study cutoff point [32,33].
For generalization, and in order to avoid overfitting in the proposed method, the K-fold cross validation test was used, which consists of dividing the data into training and testing sets, where the data is equally divided into equal or nearly equal k segments. In these partitioned folds, both the training and the test are performed through k iterations. In each iteration, we leave a fold to test and train the model in the remaining k-1 folds [52]. Based on Lopes et al. [53], Afzal et al. [54], Song et al. [50], and Chang et al. [55], our dataset was randomly divided into five subsets (k = 5). Thus, from the 772 adolescents, 618 were randomly chosen to compose the training set and 154 were used as the test set (i.e., a proportion of 80% of the data for training and 20% for testing).

Performance Analysis
The performance of the method was evaluated in regard to the sensitivity (Sens-percentage of the cases that are correctly identified as true), specificity (Spe-percentage of the cases that are correctly identified as false), and accuracy (Accu-percentage of the cases that are correctly identified among all subjects). To obtain these measurements, the values of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) were calculated [56]. The area under receiver operating characteristic (AUROC) and confidence intervals were determined. The cutoff points used for BMI, WHtR, and BFP performance analysis are described in Section 2.3.

Statistical Analysis
For the statistical analysis, the SPSS software (Statistical Package for the Social Sciences, Inc., Chicago, IL, USA) version 25.0 was used. Data was treated with descriptive procedures (median and interquartile range). To compare groups with normal and altered BFP, the Mann-Whitney U test was used. The Chi-squared test was performed to verify the frequency of categorical variables. Pearson's correlation was used to evaluate the degree of correlation between the estimated value by the model and the real one (obtained by BIA), as well as for the analysis of the correlation between BIA value obtained by the BIA and DXA value. The results were considered statistically significant if the p-value was < 0.05. Table 2 presents the general characteristics of the sample composed of 772 adolescents aged 10 to 19 years. The BFP obtained by BIA showed a high correlation (R = 0.96) with DXA BFP in the validation subgroup (n = 12), with a confidence interval of 0.85-0.98. Figure 1 presents the relationship between the DXA value and the BIA BFP value, demonstrating the validity of BIA for BFP estimation in this population.  The BFP obtained by BIA showed a high correlation (R = 0.96) with DXA BFP in the validation subgroup (n = 12), with a confidence interval of 0.85-0.98. Figure 1 presents the relationship between the DXA value and the BIA BFP value, demonstrating the validity of BIA for BFP estimation in this population. When comparing the model in relation to the body fat percentage (BIA), there was a significant association between the BFP and the chosen input attributes (Table 3).  When comparing the model in relation to the body fat percentage (BIA), there was a significant association between the BFP and the chosen input attributes (Table 3). The proposed method presented high correlation with the BFP value obtained through bioimpedance (R = 0.80), with a confidence interval of 0.73-0.85. Figure 2 presents the relation between the real value (BIA) and the value estimated by our method. The proposed method presented high correlation with the BFP value obtained through bioimpedance (R = 0.80), with a confidence interval of 0.73-0.85. Figure 2 presents the relation between the real value (BIA) and the value estimated by our method. In Table 4, it is observed that BMI and WHtR present a low AUROC discriminatory power when compared to the proposed method. The WHtR showed low performance compared to the other methods evaluated, presenting a confidence interval less than 0.5 and low sensitivity. Similarly to BMI, the WHtR failed to diagnose more than 50% of the sample of adolescents with excess body fat.

Results
To verify the performance of our method, during the test phase and relative to the obesity indicators commonly used in clinical practices and epidemiological studies, the performance indicators AUROC, precision, sensitivity, and specificity were analyzed. Our model presented excellent discriminatory power, with respect to AUROC, as shown in Table 4 and represented in Figure 3. Besides, our model showed a better performance than the BMI and WHtR indicators, having high sensitivity with respect to these indicators ( Table 4). The development and implementation of a sensible model is of great importance, as a screening method must present high sensitivity, especially if it is used in the analysis of high body fat in adolescents.  In Table 4, it is observed that BMI and WHtR present a low AUROC discriminatory power when compared to the proposed method. The WHtR showed low performance compared to the other methods evaluated, presenting a confidence interval less than 0.5 and low sensitivity. Similarly to BMI, the WHtR failed to diagnose more than 50% of the sample of adolescents with excess body fat. To verify the performance of our method, during the test phase and relative to the obesity indicators commonly used in clinical practices and epidemiological studies, the performance indicators AUROC, precision, sensitivity, and specificity were analyzed. Our model presented excellent discriminatory power, with respect to AUROC, as shown in Table 4 and represented in Figure 3. Figure 3 is a graphical representation of AUROC, which is generated by plotting a routine (true positive rate) without axis in relation to a specificity (false positive rate) on the x-axis. Thus, for a diagnostic test to be ingested, it is necessary to have a curve without the upper left triangle above the reference line. When higher, it will be better than the next model [57]. Adolescence can be divided into three stages [58]: early adolescence (10-14 years of age), late adolescence (15-19 years of age), and young adults (20-24 years of age). Our test set was divided into two groups, taking into account the gender: 10-14 years of age (precocious adolescence) and 15-19 years of age (late adolescence). For all groups, the measures of accuracy, sensitivity, and specificity were calculated in order to evaluate our method against the anthropometric indicators. The values of accuracy, sensitivity, and specificity are set forth in Table 5. Our method continued to present better sensitivity than the BMI and WHtR anthropometric indices (Table 5). Table 5. Analysis of the performance of the proposed methods in the data set relative to the anthropometric indicators BMI and WHtR, stratified by gender and age.  WHtR  MP  BMI  WHtR  MP  BMI  WHtR  MP  BMI  WHtR  Accu  84  46  46  82  73  64  76  53  23  89  32  20  Sens  75  12  12  80  40  20  100  38  0  96  23  8  Spe  100  100  100  83  100  100  0  100  100  40  100  100  TP  46  8  8  36  18  9  76  29  0  84  20  7  TN  39  39  39  46  55  55  0  24  24  5  13  13  FP  0  0  0  9  0  0  24  0  0  7  0  0  FN  15  53  53  9  27  36  0  47  76  4  67  80 Abbreviations: MP-proposed method; WHtR-waist-to-height ratio; BMI-body mass index; Accu-accuracy; Sens-sensitivity; Spe-specificity; TP-true positives; FP-false positives; TN-true negatives; FN-false negatives; %-percentage.

Discussion
Adolescence is a stage of physical and intellectual changes [2,59] where there is intense body growth that interferes with the accumulation and distribution of body fat [44]. It is therefore seen as one of the most critical periods for the development of obesity [17]. In the assessment of nutritional status, one of the most relevant indicators is the percentage of body fat (PBF), which is an Besides, our model showed a better performance than the BMI and WHtR indicators, having high sensitivity with respect to these indicators ( Table 4). The development and implementation of a sensible model is of great importance, as a screening method must present high sensitivity, especially if it is used in the analysis of high body fat in adolescents. Figure 3 is a graphical representation of AUROC, which is generated by plotting a routine (true positive rate) without axis in relation to a specificity (false positive rate) on the x-axis. Thus, for a diagnostic test to be ingested, it is necessary to have a curve without the upper left triangle above the reference line. When higher, it will be better than the next model [57].

Discussion
Adolescence is a stage of physical and intellectual changes [2,59] where there is intense body growth that interferes with the accumulation and distribution of body fat [44]. It is therefore seen as one of the most critical periods for the development of obesity [17]. In the assessment of nutritional status, one of the most relevant indicators is the percentage of body fat (PBF), which is an independent risk factor for insulin resistance and a strong predictor of morbidity, as well as being a key parameter for the preventive and therapeutic intervention of pediatric obesity [60].
Measurements of body composition derived from BIA are valuable for the analysis of the nutritional status of pediatric patients [11,61]. It is one of the most reliable and affordable methods for assessing body fat and has a high correlation (r = 0.96-0.92) with dual energy X-ray absorptiometry [9], corroborating our results (r = 0.96). Despite the advantages, BIA still has a high cost when compared to the use of anthropometric indicators and clinical variables.
Thus, we built a method to assess nutritional status in adolescents by estimating body fat percentage using low cost and easy application parameters. The method used for estimation was the MLR, which was already successfully applied to solve several problems such as clinical data analysis [47] to verify the association between autonomic cardiac function and clinical variables [49]; to investigate the effects of food contamination on gastrointestinal tract morbidity [50]; and soil density measurement [48].
In the assessment of excess body fat, the proposed method obtained better performance (accuracy, sensitivity, and AUROC) than the anthropometric indicators BMI and WHtR, which are usually used to assess the nutritional status of adolescents. It is an attractive method with low cost and easy application when compared to other methods of body composition analysis such as bioimpedance. The best performance of our method can be explained by the use of predictor variables, such as body circumferences (Table 1), which are widespread measures in the literature for the assessment of obesity in this population and are associated with the presence of visceral fat and cardiovascular risk factors [38,[62][63][64], as well as the use of indicators such as gender, age, height, and body mass together, which are widely used in the analysis of the nutritional profile and health status of the juvenile population [1,17,34,65].
The body mass index, despite being the most used method in clinical practice and epidemiological studies to assess excess body fat [21,66], is widely criticized for not correlating with body composition and being a poor predictor of body fat [9,12,15,60,67]. In a recent review, it was observed that BMI has high specificity (approximately 92%) and low sensitivity (approximately 50 %) to detect obesity based on body fat percentage [68,69]. Therefore, more than half of individuals could be mistakenly classified with normal BFP by BMI [68], corroborating the results of the present study.
This fact is concerning because the low sensitivity of BMI indicates that excess adiposity is being underdiagnosed in several individuals. Moreover, because the first step in dealing with a risk factor is the precise identification of the pathophysiological problem, late diagnosis of excess adiposity will delay the treatment of associated comorbidities, as well as the implementation of intervention and control measures [68,70,71], especially in the juvenile population [71].
Although used in several studies with adolescents, WHtR did not show efficient discriminatory power in the evaluation of adolescents with high body fat in the present study. In a systematic review, Lo et al. [72] observed that WHtR did not perform better in predicting cardiometabolic risk factors than anthropometric indicators BMI and WC (waist circumference) in children and adolescents.
Thus, the use of high sensitivity methods is a major challenge for the early diagnosis of obese adolescents. The model proposed here has predictive and practical advantages in situations with limited resources, such as areas without access to equipment for body composition analysis (bioimpedance, for example). Thus, this is an alternative for screening adolescents for excess body fat. In addition, such a method may guide health professionals in decision-making and potentially expedite tests such as lipid profile and insulin resistance, which are associated with high body fat levels and cardiovascular risk.
A limitation of this study was to use as a test set approximately 20 % of the sample, since a larger sample could provide more robust results. However, in this study, the main goal was to present a new method for screening obesity in adolescents. The authors believe that an external validation study should be performed in other regions of Brazil and other countries due to variations in ethnic descent.

Conclusions
The computational model of this study obtained a better performance in the evaluation of excess body fat in adolescents compared to the usual anthropometric indicators, thus presenting itself as a low cost alternative for screening obesity in adolescents living in Brazilian regions where financial resources are scarce.