Predictors of Body Mass Index among Pregnant Women in Nigeria: A Comparison of Ordinary Least Squares Regression and Quantile Regression Models Using Machine Learning Approach

Poor nutrition during pregnancy is a major public health problem. Maternal under nutrition is a significant risk factor for maternal morbidity, mortality, poor birth outcomes (e.g. low birth weight), and infant mortality. Maternal under nutrition is defined as having a body mass index (BMI) <18.5 kg/m2. Previous studies on maternal BMI utilized classical statistical approach, whose criteria for model assessment are goodness-of-fit test and residual examination. The aim of this study was to identify predictors of BMI among pregnant women in Nigeria, and to compare the performance of ordinary least squares (OLS) regression and quantile regression using machine learning approach. This study utilized data from the 2013 Nigeria Demographic and Health Survey. A total of 3,049 pregnant women were included in the study. Data were summarized using descriptive statistics. The assumption of normality of the outcome variable (BMI) was tested using one-sample Kolmogorov-Smirnov test. Bivariate associations of BMI with independent variables were assessed using robust (nonparametric) statistical techniques: Kendall’s tau correlation for continuous predictors, Wilcoxon rank sum test for binary predictors and Kruskal-Wallis test for multinomial predictors. Predictors of maternal BMI were investigated using OLS and quantile regression analyses. Model assessment was made using 10-fold cross-validation. A two-tailed p-value <0.05 was considered statistically significant. The respondents had a mean age of 28.22 ± 6.30 years, and a mean BMI of 23.81 ± 4.18 kg/m2. Multivariate analyses identified respondent’s age, duration of pregnancy, wealth class, and residence as predictors of maternal BMI. The crossvalidated mean squared error for the OLS regression model was lower than that for the quantile regression model. Respondent’s age, duration of pregnancy, wealth class, and residence were significantly associated with maternal BMI. OLS regression model fit the data more than the quantile regression model. Citation: Ajayi DT, Bello S (2018) Predictors of Body Mass Index among Pregnant Women in Nigeria: A Comparison of Ordinary Least Squares Regression and Quantile Regression Models Using Machine Learning Approach. J Biom Biostat 9: 402. doi: 10.4172/2155-6180.1000402


Introduction
Poor nutrition during pregnancy is a major public health problem.Maternal undernutrition is a significant risk factor for maternal morbidity, mortality, poor birth outcomes (e.g.low birth weight), and infant mortality [1].Undernutrition is responsible for more than 3.5 million deaths of mothers and children under the age of 5 each year in developing countries [2].The prevalence of maternal undernutrition in Nigeria is 6.7% [3].
Maternal nutrition refers to the nutritional needs of women during antenatal and postnatal periods and sometimes also to the period prior to conception (i.e. during adolescence).Maternal undernutrition (or chronic energy deficit) is defined as having a body mass index (BMI) <18.5 kg/m 2 [1].Maternal nutrition plays a critical role in fetal growth and development.Intrauterine environment is a major determinant of fetal growth.Studies have shown that maternal undernutrition during pregnancy reduces placental and fetal growth, a condition known as intrauterine growth restriction (IUGR) [4,5].Maternal undernutrition causes clinical complications in fetuses and infants.For example, about 50% of nonmalformed stillbirths in humans are attributed to IUGR [6].Moreover, perinatal mortality rates are 5-30 times greater in infants who weigh <2.5 kg at birth than those who have average birth weights [6].Infants with IUGR are more likely to develop neurological, respiratory, intestinal, and circulatory disorders than those without IUGR [5].
Predictors of maternal nutrition have been investigated.In a review of studies on maternal and child health indicators in the South Asian region, Bhutta et al. [7] concluded that maternal illiteracy, poverty, and lack of empowerment of women were factors associated with maternal undernutrition.Begum and Sen [8] reported that education, exposure to media, and domestic decision-making were associated with maternal nutritional status in Bangladesh.Senbanjo and colleagues [3] identified age at first birth, maternal education level, and number of births as determinants of maternal nutritional status in Nigeria.However, previous studies employed classical statistical methods, such as ordinary least squares (OLS) regression and logistic regression, where models are validated using goodness-of-fit tests and residual examination.Breiman et al. [9] demonstrated that predictive accuracy of a statistical technique on a test data set is the appropriate criterion for how good the model is, and this is the hallmark of machine learning approach.
Machine learning entails estimating the systematic relationship (f ) between an outcome variable and input variable(s) using a subset of the data set (training set), and assessing the model performance on the hold-out set (observations not used in fitting or training the statistical model) [10].In the regression setting, the most commonly used measure of model performance or accuracy is the mean squared error (MSE), given by: Where y i represents the response variable for the ith observation, f is the estimate of f, and ( ) ˆi f x is the prediction that f gives for the ith observation [10].Cross-validation (CV) is a widely used technique for model assessment [10].k-fold CV involves randomly dividing the set of observations into k folds, of approximately equal size.The first fold is treated as a validation set (test set), and the method is fit on the remaining k−1 folds.The MSE is then computed on the observations in the held-out fold.This procedure is repeated k times; each time, a different group of observations is treated as a validation set.This process results in k estimates of the test error, and the k-fold CV estimate is computed by averaging these values [10].
Studies have shown that BMI distribution is skewed; thus, researchers employed quantile regression to investigate factors associated with BMI, in addition to OLS regression [11][12][13].Quantile regression is a semi-parametric technique that models quantiles of the response variable conditional on the covariates [14].Quantile regression allows a comprehensive evaluation of the associations between predictor(s) and the outcome at various quantiles (or percentiles).Unlike the OLS regression, quantile regression makes no assumptions about the distribution of the errors; thus, it is more robust to non-normal errors and outliers [15].
Studies comparing predictive performance of quantile regression models with different quantiles are lacking.Moreover, studies on comparison of OLS regression and quantile regression models using CV are not available in the literature.Therefore, the aim of this study was to identify predictors of BMI among pregnant women in Nigeria, and to compare the performance of OLS regression and quantile regression using machine learning approach.

Methods
This study utilized data from the 2013 Nigeria Demographic and Health Survey (NDHS), implemented by the National Population Commission (NPC) in conjunction with the ICF International.The survey, a population-based cross-sectional study, employed a stratified three-stage cluster sampling to select the respondents.Respondents were selected from 904 clusters, comprising 372 urban areas and 532 rural areas in 36 states and the Federal Capital Territory, Abuja, Nigeria.A total of 38,868 women aged 15-49 years and 17,317 men aged 15-49 years were interviewed.Detailed description of the sample design and implementation is available in the 2013 NDHS report [16].
The Individual Recode data set was used.A total of 3,049 pregnant women were included in the analysis, after listwise deletion of missing values.Data were summarized using descriptive statistics.The assumption of normality of the outcome variable (BMI) was tested using one-sample Kolmogorov-Smirnov test.Bivariate associations of BMI with independent variables were assessed using robust (nonparametric) statistical techniques: Kendall's tau correlation for continuous predictors, Wilcoxon rank sum test for binary predictors and Kruskal-Wallis test for multinomial predictors.Predictors having p-value <0.2 in bivariate analysis were included in multivariate model [17].Multivariate outliers were identified using the robust Mahalanobis distance.Predictors of maternal BMI were investigated using OLS and quantile regression analyses.Multiple Quantile regression models were fitted using quantiles from 0.10 to 0.95 with increment of 0.05.Tenfold CV was performed the select the best quantile regression model using MSE as a performance criterion.The performance of the OLS regression model and the selected quantile model was compared using 10-fold cross-validation.A two-tailed p-value <0.05 was considered statistically significant.All analyses were done in R, version 3.4.4(R Core Team, Vienna, Austria).

Sample characteristics
The mean age of the respondents was 28.22 ± 6.30 years (Table 1).Most (99.1%) of the respondents were married (Table 2).About one- half (50.6%) of the respondents had no formal education.About 61% of respondents were on antenatal care and 40.3% were in the second trimester.About a third (31%) of the respondents resided in the urban area.
The mean BMI of the respondents was 23.81 ± 4.18 kg/m 2 .The BMI was right skewed (Skewness=1.52),and the Kolmogorov-Smirnov test showed a significant deviation from normality (p-value <0.001).Figure 1 shows the distribution of maternal BMI.

Bivariate analyses
Table 3 shows the results of bivariate analyses of factors associated with BMI among pregnant women.Respondent's age, respondent's education level, respondent's employment status, number of living children, duration of pregnancy, antenatal care, partner's employment status, wealth class, residence, and region were significantly associated with maternal BMI.

Multivariate analyses
Figure 2 shows the plot of the robust Mahalanobis distance against the observation index.The plot implied the presence of several outliers in the predictor space.The 10-fold CV approach demonstrated that 0.55 quantile (55 th percentile) had the lowest MSE among the 18 quantile models tested (Table 4).Table 5 shows the results of multivariate analyses of factors associated with maternal BMI.Similar to the results of OLS regression, quantile regression revealed that respondent's age, duration of pregnancy, wealth class, and residence were significant predictors of maternal BMI.Maternal BMI increased with age.Respondents in the second and third trimesters had higher BMI than those in the first trimester.BMI was higher among respondents in the rich wealth class than those in the poor wealth class.The crossvalidated MSE for the OLS regression model was lower than that for the 0.55 quantile regression model (14.396 vs. 14.431).

Discussion
This study demonstrated that the model with 0.55 quantile had the highest predictive performance.At this quantile, respondent's age, duration of pregnancy, wealth class, and residence were significant predictors of maternal BMI.OLS regression also identified respondent's   age, duration of pregnancy, wealth class, and residence as significant predictors of maternal BMI.
The finding that wealth class was associated with maternal BMI is consistent with Bhutta et al. result [7].The significant association of residence with maternal BMI observed in this study agrees with the report of Senbajo and colleagues from Nigeria [3].In contrast to the findings by Bhutta et al. [7], and Senjobi et al. [3], maternal education was not significantly associated with maternal BMI.Similar to the findings by Kusin et al. [18], number of living children and birth interval were not associated with maternal BMI in this study.
This study also found that OLS regression model had better predictive accuracy than quantile regression model.It has been recommended that OLS regression should not be used when the assumption of normality is violated or when the data contain outliers [19].In this case, Madadizadeh et al. [20] suggested the use of quantile regression, instead.Although maternal BMI violated the assumption of normality and the data had many outliers, the OLS regression  model demonstrated better fit than the quantile regression model, using machine learning approach.This finding upholds Breiman's proposition, that model assessment and model selection should be based on predictive accuracy [9].
This study had some limitations.For example, the study utilized NDHS data, the study design of which was cross-sectional; thus, the temporal sequence of the observed associations of predictors with maternal BMI could not be established.Analytic epidemiological studies (e.g.cohort design) would be more suitable in establishing the temporal sequence.Moreover, the determinants of maternal BMI investigated were not exhaustive because the data set was secondary.Other factors that might influence maternal BMI, such as diseased state, health education, and access to health care, could not be investigated.In spite of these limitations, this study utilized a population-based data; thus, the findings have considerable external validity.Also, this study provides evidence on model assessment using machine learning approach.

Figure 2 :
Figure 2: A Plot of the robust Mahalanobis distance against the observation index.

Table 3 :
Bivariate analyses of factors associated with maternal BMI.

Table 4 :
Results of 10-fold cross-validation of quantile regression models.

Table 5 :
In conclusion, respondent's age, duration of pregnancy, wealth class, and residence were significant associated with maternal BMI.OLS regression model fit the data more than the quantile regression model.Predictive accuracy is a more suitable criterion for model assessment than the classical goodness-of-fit test or residual examination.Multivariate analyses of factors associated with BMI -Quantile regression vs. OLS regression. 5. Wu G, Bazer FW, Cudd TA, Meininger CJ, Spencer TE (2004) Maternal nutrition and fetal development.J Nutr 134: 2169-2172.