Factors Influencing the Incidence of Obesity in Australia: A Generalized Ordered Probit Model

The increasing health costs of and the risks factors associated with obesity are well documented. From this perspective, it is important that the propensity of individuals towards obesity is analyzed. This paper uses longitudinal data from the Household Income and Labour Dynamics in Australia (HILDA) Survey for 2005 to 2010 to model those variables which condition the probability of being obese. The model estimated is a random effects generalized ordered probit, which exploits two sources of heterogeneity; the individual heterogeneity of panel data models and heterogeneity across body mass index (BMI) categories. The latter is associated with non-parallel thresholds in the generalized ordered model, where the thresholds are functions of the conditioning variables, which comprise economic, social, and demographic and lifestyle variables. To control for potential predisposition to obesity, personality traits augment the empirical model. The results support the view that the probability of obesity is significantly determined by the conditioning variables. Particularly, personality is found to be important and these outcomes reinforce other work examining personality and obesity.


Introduction
One in four Australian adults was obese in 2009 with another one-third being overweight [1]. Over the last two decades, there has been a steady shift in the Australian population towards the higher end of the body mass index (BMI), driven mainly by weight gain rather than by changes in height. The BMI, a simple index of weight for height, is commonly used to classify people as overweight and obese. It is defined as the weight in kilograms divided by the square of the height in metres (kg/m 2 ) [2]. An Australian study suggests that excessive body weight is likely to be costly, with an estimated economic cost including direct health costs, productivity losses, and carer costs of Australian $60 billion dollars per year [3]. The increasing prevalence of obesity is linked to the onset of chronic diseases including type 2 diabetes, hypertension, coronary heart disease, elevated cholesterol levels, depression, and musculoskeletal disorders [4][5][6][7]. Other studies have demonstrated that obesity is associated strongly with a deterioration in health-related quality of life, including both the physical and mental health domains [8]. It has also been demonstrated that obesity negatively affects workforce participation and gives an increased risk of occupational injury [9][10][11]. This has resulted in a growing demand for research to better understand the factors that determine obesity [12] and the socio-economic impact of being overweight.
This paper explores those factors that influence the incidence of obesity among Australians by way of a random effects generalized ordered probit model. The paper utilizes data from the Household Income and Labour Dynamics in Australia (HILDA) Survey, a household-based annual panel survey.

The Econometric Model
Obesity is usually described in terms of an ordered response model, in which the underlying latent variable is the BMI score [16,19,29].
For the ordered responses, the outcome for a categorical response variable is defined as: where the J outcomes have a natural integer ordering. Further, a latent variable (in this case BMI score), which underlies the response variable, is defined as [30] (p. 655): where the variables in vector x are seen to govern the ordered responses of individuals and, for identification, do not contain a constant. The observed responses can be associated with the underlying latent variable (in this case BMI): where the γ j are thresholds or cut points to be estimated. The response probability is given by [31] (p. 520): with the restriction that γ 0 = −∞ and γ j = ∞. The function F is an appropriate cumulative distribution function for ε i . Greene et al. cogently argue that, when modelling BMI category outcomes where those categories are rigidly bounded by WHO guidelines, it might be more appropriate to model ordered responses with flexible boundaries, allowing for sources of individual heterogeneity in terms of the relationship between well-being and BMI category [16,19].
In the generalized ordered response model, the thresholds are not fixed (parallel), but are allowed to vary across individuals. Individual heterogeneity is captured by allowing thresholds to vary with those variables that condition category probability [32]. That is: Substitution of Equation (5) into the cumulative distribution of Equation (4) gives [33]: where β j = β − δ j , leading to a separate set of coefficients for each category. The generalized model of Equation (6) is estimated as a series of J − 1 binary response models [34], proceeding sequentially on the series from the first model, which analyses category 1 versus 2, ..., J, to the last model, which analyses category 1, ..., J − 1 to J.
In panel random effects, individual heterogeneity is also introduced by augmenting Equation (6) with the mean zero and constant variance σ 2 α variable α i . That is, the latent variable is specified as [30] (p. 662): leading to the cumulative distribution function [35]: where individual heterogeneity is captured by the non-parallel cut offs and the panel random effects component. Conditional on ε it |x i , α i ∼ N(0, 1), in Equation (7), we estimate a random effects generalized ordered probit for a three category variables based on individual BMI scores for the last five years of the HILDA survey. The three ordered categories, based on the WHO guidelines, are normal, overweight and obese. The generalized model nests alternative models based on restricting the parameters to be identical between categories. Clearly the most specific model is the standard ordered probit, in which all parameters are identical between categories. We adopt a sequential procedure advocated by Pfarr et al., following Williams, in testing down from the generalized model [34,35]. In the first round, a Wald test is performed on the restriction that all parameters are the same across categories. The model is then re-estimated with the restriction that the least significant parameter in the first round is identical across all categories. The Wald test is then applied again. This process of estimation, testing restrictions, and then applying a restriction to a new estimate proceeds until only parameters that are significantly different over categories remain. The model was estimated using REGOPROBIT2 (Statistical Software Components, Chestnut Hill, MA USA) [33].

Data and Descriptive Statistics
All data come from the HILDA panel. It needs to be noted that some of the data were imputed following non-response by a panel member or the failure of a household to provide some information. Imputation for different data items, such as income, is undertaken by making use of responses from similar individuals or households [13]. For this paper, the data were further reduced by the researchers, with missing responses for key variables being dropped to ensure balance. Table 1 presents the list of variables included in the model. According to WHO international classifications, the BMI cut-off points for adults are less than 18.5 for underweight, range between 18.5 to 25.0 for normal weights, between 25 to 30 for overweight and more than 30 for obese. The category underweight is not considered in the dependent variable ordobese, since our analysis focuses on the overweight and obese categories relative to the normal category of BMI. This paper also uses BMI categories and not the BMI numerical values. The use of BMI categories has been criticized due to a reliance, when calculating BMI, on self-reported height and weight and the possibility of misstatement [17,18,36]. In addition, the use of categories of BMI and not BMI numerical values results in a loss of information. These criticisms are duly noted with recognition, as per Greene et al., that the BMI category is likely to be correct [16,18]. The correlation coefficient was determined to be very high for self-reported weight and height and measured weight and height (greater than 0.9) [37]. In addition, policy-makers are interested in categories and individual movements in the categories rather than the marginal changes in them [16].
Age is expected to have a quadratic association with BMI [38]. The general increasing trend of BMI with age may be attributed to age-related losses in lean body mass, resulting in lowered energy expenditure. However, later in life, BMI is expected to decrease with age due to biological mechanisms. To account for this pattern in the relationship between age and BMI, age and age squared terms are included in the model presented in this study.
A significant relationship between education and obesity has been shown in many studies [39,40]; those with higher levels of education have a significantly lower risk of obesity. The variable educ, the self-reported highest level of education attained by participants, was included in this model to capture this relationship and was collapsed into four categories, as outlined in Table 1.
The respondent's employment status, empstatus, was re-coded as a binary variable, scoring 1 for employed and 0 for unemployed or not being in the work force. Income is captured by the variable lndinc/p, which is the logarithm of the ratio of household annual disposable income to the number of persons in the household who were included in the survey at the time the data was collected. The covariate losat, satisfaction with life, was collapsed from ten to three categories, especially to re-categorize those who rated themselves as dissatisfied with their life. The covariates area, remoteness area, and advantage, the SEIFA 2001 decile of index of relative socio-economic advantage/disadvantage, are included in this model to capture a likely association between living in remote areas and being in relatively low socio-economic status with a higher overweight risk. Consumption spending on alcohol (alcohol) and on foods prepared outside the home (meals) is generally associated with increased obesity. Potential differences in terms of varying household types are captured by the inclusion of the variables marstatus and hhtype, where the latter is designed to control for single parents.
The five personality traits associated with the FFM are included as measures of the health status and personalities of respondents. The panel on obesity runs from 2006 to 2010 inclusive. Personality data was collected for the years 2005 and 2009. Personality scores are relatively stable [41], and scores for the year 2005 were applied to the years 2006 and 2007, while scores for 2009 were applied to the years 2008 and 2010 to complete the panel. This technique is in keeping with Cobb-Clark and Schurer in their study of the FFM from HILDA and their demonstration that personality traits are stable for working-age adults [41]. The FFM is well established in psychology literature [42] but is used less frequently in econometric work [43]. HILDA respondents are administered a version of the Big Five personality inventory, based on Saucier (1994), using the trait descriptive approach [44]. Respondents were asked how well 36 different adjectives describe them, with 28 used to derive scales of five specific personality traits. Scores for each of the traits are constructed by assigning a value from 1 to 7 to each item, with the higher score indicating that the trait describes the individual better, summing them, and obtaining an average [22,41,42]. The five personality traits are opene, openness to experience; consc, conscientiousness; extrv, extraversion; agree, agreeableness; and emote, emotional stability. The internal reliability coefficients (Cronbach alpha) for these traits were shown by Wooden to be satisfactorily high in HILDA (greater than 0.7) and identical between wave five and wave nine of the survey [22]. Testing was conducted by Wooden on the extent to which these personality traits changed by age between the two survey years (2005 and 2009) to conclude that for those aged 25 to 64, the personality scores for most individuals do not change much over time [22]. Some work has been done linking personality traits and obesity [20,45,46]. These studies use different personality variables to the FFM; the former uses the Karolinska Scales of Personality (KSP), and Sullivan et al. use the Temperament and Character Inventory (TCI). Fortunately, the TCI can be linked to the FFM [47].
The mean and standard deviation scores for the continuous variables in Table 1, age, lndinc/p, alcohol, and meals, have their usual meaning. The personality scores are ordinal but take on 36 different ranks between the values 1 and 7 inclusive; as such, the reported means and standard deviations have the usual interpretation. The mean for the binary variables, gender, empstatus, hhtype, and marstatus, give the proportion of the estimation sample scoring 1. The means and standard deviations for the remaining variables, which are all ordered categorically, should be interpreted with caution. The relative frequency distributions for these categorical variables are given in Table 2. In Table 2, advantage is the SEIFA index, which is simply the decile of socio-economic advantage from the lowest to the highest ten percent. The fact that the relative frequencies all approximate to the value of ten gives an indication of the representativeness of the HILDA sample. The final column gives the distribution of scores over the five panel years for the BMI categories. Table 3 complements this column by giving the transition probabilities between categories between the first and last years of the sample. Reading down the columns in Table 3 gives the BMI category in 2006, and reading across the rows gives the category in 2010. The elements on the principal diagonal give the probability of remaining in the same category: these are relatively large, indicative of stability over time. The off-diagonal elements give the probability of transition between categories. The probability of moving from normal to overweight is 0.152 and the probability of moving from obese to normal is 0.016. These are unconditional transitional probabilities. The next section examines the conditional probabilities of being overweight or obese, identified by the random effects generalized ordered probit model.

Results
The random effects specification was applied to Equation (6), and this model was estimated without restriction. Recalling Section 2, a series of sequential Wald tests were applied to this unrestricted model, where each test is on the basis of the parallel lines restriction to variables. The variable with the highest probability value was then restricted and the model re-estimated with the restriction, with the subsequent imposition of the parallel lines restriction onto the remaining variables. This process of test and then restriction proceeded until only those variables with a probability score of less than 0.05 in the Wald test remained. See Table A1 of the Appendix A for the probability scores of the sequential Wald tests for all variables. Six variables were identified where the parallel lines restriction applied. That is, the estimated coefficients for these variables were deemed to be the same for both equations in the model. Table 4 gives a Wald test on jointly restricting these variables to have the same coefficient values over the two equations and clearly indicates that these restrictions cannot be dismissed.  The results for the generalized ordered probit of Equation (6) from Section 2, but estimated with the parallel lines assumption of Table 4, are given in Table 5. The results for the generalized ordered probit without parallel lines restrictions are reported without comment in Table A2 of the Appendix A. The first two columns of Table 5 give the coefficients and standard errors for eq1, with the category normal in the variable ordobese, against the two categories overweight and obese. The following two columns of coefficients and standard errors are for eq2, with the categories normal and overweight against the category obese. The underlined coefficients are for those variables where the parallel lines restriction is not relaxed according to the test outcomes of Table 4. It should be noted that the coefficients for three of these variables, hhtype, alcohol, and meals, test as not significantly different from zero in both eq1 and eq2. Further, the estimates of the unrestricted generalized ordered probit, as shown in Table A2 of the Appendix A, show that the remaining three variables with parallel line restrictions, area, agree, and emote, have estimated coefficients which are all significant at the 1% level in both equations and have similar magnitudes in both equations for the unrestricted estimates.
Before moving on to a detailed description of the estimated coefficients and their implications for the BMI categories, it would be useful to deal with two statistics reported in the header and footer of Table 5. The Wald test at the header of the table is the usual model test with the slope parameters jointly restricted to zero. Note here that there are 30 slope parameters, as the model is estimated conditional on six parameters being common to both eq1 and eq2. In the footer of the table, ρ is the ratio ρ = σ 2 α /(σ 2 α + σ 2 ε ) where σ 2 α and σ 2 ε are the variance of the unobserved individual effect and the idiosyncratic error, respectively. The statistic ρ is restricted to the unit interval and it gives the proportion of the total variance given by the unobserved individual effect. It is the correlation of overweight and obese over time for individuals and is indicative of the level of persistence in the overweight/obese category against normal weight and the obese category against the normal/overweight category for individuals [48,49]. Here the score, at 0.852, is close to 1.0 and is indicative of high persistence for individuals over time.  Ten variables are significant at the 1% level in both equations. Of these, three are restricted to having fixed coefficients for both equations, area, agree, and emote. Remoteness area, area, is ordered categorically, and the ordering is over increasing remoteness. Thus probability in both equations is increasing with remoteness. Recall that the personality scores are coded from 1 to 7, with low scores reflecting negative aspects of the trait and high scores reflecting positive aspects of the trait. Increasing emotional stability, emote, is associated with lower probability, but the reverse is true for agreeableness, agree, with increasing agreeableness indicating a higher probability.
The seven variables for which the parallel lines restriction does not apply and which are significant in both equations are age, age2, educ, advantage, marstatus, consc, and opene. The outcomes for age and age2 indicate that probability is increasing, but this is non-linear in both equations. The education variable, educ, is an ordinal scale in qualification achieved, ranked from highest to lowest. The positive sign is expected and probability is decreasing with educational achievement [39,40]. The ordinal scale for advantage, the SEIFA deciles of relative socio-economic advantage, is increasing and the negative sign is expected. Being in a married or de facto relationship, marstatus, shifts the probability up in both equations. A rationale for this may be in terms of marriage markets; once married, competition for a partner ceases, and individuals become less concerned with appearance. That is, marriage causes overweightness. However, this also opens up the possibility of marriage selection and that lean people are more likely to be selected in marriage [50], although Lin et al. found some evidence to suggest that when single females are faced with adverse marriage market conditions, low male to female ratios, then females have less incentive to remain fit and healthy [51]. The discussion of the two personality traits, consc and opene, will be deferred to a joint examination of the implications of the results for all five psychological variables.
Two variables are significant in eq2 but not in eq1. There is a significant downward shift in the probability of being obese relative to falling into the normal/overweight category that is associated with being employed, empstatus. The categorical variable losat is positively ordered in life satisfaction; that is, higher scores indicate increased life satisfaction. The outcome of a negative relationship with probability in eq2 is expected. Three variables are significant in eq1, but not in eq2; gender, lndinc/p, and extrv. All three coefficients are positive, so that probability increases in eq1 but not in eq2. The results indicate that being male increases the likelihood of being overweight. Further, the likelihood of being overweight increases with household disposable income per person, lndinc/p.
All of the FFM personality traits are significant with the exception of extraversion in eq2. Three have negative coefficients; consc, emote, and opene. That is, the probability of being overweight/obese is decreasing with increasing conscientiousness, emotional stability, and openness to ideas. Given the nature of these traits, the negative sign would be expected. Similar findings on the association between BMI and body weight and conscientiousness were revealed by Kim (2016). That study used participants from the National Longitudinal Study of Adolescent to Adult (Add Health) and concluded that a one standard deviation increase in conscientiousness was associated with a decrease in BMI by 0.89 and a 12% reduction in the probability of being obese [21]. There is nothing in the nature of the remaining traits, agreeableness and extraversion, which would indicate any conditioning of probability. However, the results here replicate the results for the community in Sullivan et al. [20]. They found that being obese was positively associated with novelty seeking and reduced reward dependence, which is parallel with the positive sign for extraversion here. Further, the negative signs for conscientiousness and emotional stability are replicated in Sullivan et al., with lower persistence and self-directedness being associated with obesity [20].
This study had a number of limitations. It relied on, and analysis was based on, HILDA panel data without any augmentation to link to other datasets for one or more periods, such as health campaigns at national or state level. In addition, within the HILDA dataset, households may have been comprised by one or many more than one person, but no attempt was made in the analysis to group respondents by household. A third limitation was the exclusion of "underweight" due to the small proportion of respondents in this category, and the grouping of persons with BMI > 30 into the obese category with no further attempt to include another category of morbidly obese.

Conclusions
A generalized ordered probit model was estimated to identify those factors that influence being overweight or obese relative to normal weight. The parallel lines assumption of the standard ordered probit model could not be dismissed for six of the 18 conditioning variables, but where two of these did not have coefficients, it significantly different from zero. This suggests that the generalized order probit, with partial restriction, is the appropriate specification. The probability of being overweight/obese relative to normal weight and the probability of being obese relative to normal/overweight was found to be conditioned by 13 and 12 of the 18 candidate variables, respectively. The personality traits of the FFM were included in the set of covariates to control for a predisposition to being obese. These personality traits were found to be significant in conditioning probability and replicated outcomes of another study where different traits, but ones that correlated to the FFM were used. This finding may have important implications. If personality traits are indicative of a predisposition to being obese, then any policy mechanism designed to combat obesity must take this into account [27]. Policies which concentrate on lifestyle choice and economic and social factors may be inefficient if the relationship between obesity and personality is ignored.

Conflicts of Interest:
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results. Table A1 gives the results of the sequential parameter restriction tests to identify those variables to which the parallel lines restriction may apply. Note that the chosen probability is 0.05. It should also be noted that had the 0.1 level been used, the number of restrictions would have been the same. The results for the unrestricted model are given in Table A2. The structure of this table is identical to that of Table 5 in the text. The estimates of Table A2 are very similar to the restricted model and this could be taken as indicative of a relatively robust estimate.