An Instrumental Variable Probit (IVP) Analysis on Depressed Mood in Korea: The Impact of Gender Differences and Other Socio-Economic Factors

Background - Depression is a mental health state whose frequency has been increasing in modern societies. It imposes a great burden, because of the strong impact on people’s quality of life and happiness. Depression can be reliably diagnosed and treated in primary care: if more people could get effective treatments earlier, the costs related to depression would be reversed. The aim of this study was to examine the influence of socio-economic factors and gender on depressed mood, focusing on Korea. In fact, in spite of the great amount of empirical studies carried out for other countries, few epidemiological studies have examined the socio-economic determinants of depression in Korea and they were either limited to samples of employed women or did not control for individual health status. Moreover, as the likely data endogeneity (i.e. the possibility of correlation between the dependent variable and the error term as a result of autocorrelation or simultaneity, such as, in this case, the depressed mood due to health factors that, in turn might be caused by depression), might bias the results, the present study proposes an empirical approach, based on instrumental variables, to deal with this problem. Methods - Data for the year 2008 from the Korea National Health and Nutrition Examination Survey (KNHANES) were employed. About seven thousands of people (N= 6,751, of which 43% were males and 57% females), aged from 19 to 75 years old, were included in the sample considered in the analysis. In order to take into account the possible endogeneity of some explanatory variables, two Instrumental Variables Probit (IVP) regressions were estimated; the variables for which instrumental equations were estimated were related to the participation of women to the workforce and to good health, as reported by people in the sample. Explanatory variables were related to age, gender, family factors (such as the number of family members and marital status) and socio-economic factors (such as education, residence in metropolitan areas, and so on). As the results of the Wald test carried out after the estimations did not allow to reject the null hypothesis of endogeneity, a probit model was run too. Results - Overall, women tend to develop depression more frequently than men. There is an inverse effect of education on depressed mood (probability of -24.6% to report a depressed mood due to high school education, as it emerges from the probit model marginal effects), while marital status and the number of family members may act as protective factors (probability to report a depressed mood of -1.0% for each family member). Depression is significantly associated with socio-economic conditions, such as work and income. Living in metropolitan areas is inversely correlated with depression (probability of -4.1% to report a depressed mood estimated through the probit model): this could be explained considering that, in rural areas, people rarely have immediate access to high-quality health services. Conclusion - This study outlines the factors that are more likely to impact on depression, and applies an IVP model to take into account the potential endogeneity of some of the predictors of depressive mood, such as female participation to workforce and health status. A probit model has been estimated too. Depression is associated with a wide range of socioeconomic factors, although the strength and direction of the association can differ by gender. Prevention approaches to contrast depressive symptoms might take into consideration the evidence offered by the present study.


Implications for policy makers
The results of the paper assist in recognizing which people are more likely to develop depression and the impact of gender differences and education level: these factors should be taken into account when assessing the benefits that could be obtained from support and counseling programs. • Being aware of the social impact of depression outlines the relevance of investing in the implementation of counseling programs. Counseling programs and other public health measures against depression are less expensive than costs of depression. • There is the need to monitor mental illness in rural areas too, especially where there are huge differences between urban and rural areas development, as in Korea.

Implications for public
Depression is likely to strike a large fraction of the population and support and counseling programs should be easily implemented and accessed. Support programs should be easily accessed in rural areas too, to guarantee equality of treatment for mental illness.

Background
Mental health disorders, among these ones, depression, are likely to impose a great burden in modern societies (1). Depression brings about the human costs of sadness, sense of isolation, inability to enjoy life, and, for 35,000 people every year, attempted suicide (2). Depressed individuals are five times more likely to abuse drugs (3,4). In addition to human costs, economic costs have to be taken into account too: people lose 5.6 hours of productive work every week when they are depressed (5): it has been estimated how half of the loss of work productivity is due to absenteeism and short-term disability (6,7). People with symptoms of depression are 2.17 times more likely to take sick days (8) and, when they are at work, their productivity is impaired, depending on how severe the depression is. Workplace costs of depression have been estimated being over $34 billion per year (9); as the costs of absenteeism were directly related to actually taking antidepressant medications (10), those people who took the prescribed medication had a 20% lower cost of absenteeism.
Lastly, it is well-documented that poor mental health contributes to unemployment. The unemployed tend to have higher levels of impaired mental health, including anxiety, depression, as well as higher levels of mental health hospital admission, premature mortality, and chronic diseases (11). If people showing affective symptoms and depression seek treatment earlier -and get effective treatment -the human cost and the economic costs could be reversed (3).
Depression is a disorder that can be reliably diagnosed and treated in primary care: as outlined in the World Health Organization (WHO) mhGAP Intervention Guide (12), preferable treatment options consist of basic psychosocial support combined with antidepressant medication or psychotherapy, such as cognitive behavior therapy, interpersonal psychotherapy or problem-solving treatment. Looking at gender, depression also strikes more women than men: its prevalence is one to five times greater in females (13)(14)(15)(16)(17)(18)(19). This gender gap can be explained by genetic, neurohormonal, psychobiological (20) and social factors (21), the latter far more predictive of gender gaps in depression than genetic or hormonal factors. Gender gaps have been seen identified in the formal literature when considering men's lower likelihood to seek care; in a recent study, it has been observed that the circumstance of women having a higher care use is dependent on the type of care provider, with greater gender inequality in the use of primary healthcare (22). Further, women might be more likely to experience both role and financial strains, that may be associated, for example with child care difficulties, the need to juggle work and family responsibilities, and single parenthood (23)(24)(25). This study focuses on the individual determinants of depressive behavior and its gender differences using data from South Korea. Few epidemiological studies have examined the determinants of depression in Korea so far, and they were either limited to samples of employed women or did not control for individual health status (26)(27)(28)(29). Former research reported the prevalence of depression disorders in several world regions, but Korean statistics were not mentioned. In the geographic region including Eastern Asian countries as China and Vietnam, people aged 19-79 experiencing at least 10 depression episodes ranged between 18.5% and 24%. In South Eastern Asia (such as Thailand, Indonesia, etc.) this percentage was lower and ranged between 13% and 18% (30). The choice of covariates to include in the analysis was based on previous empirical studies on depression. In addition to gender, age is among determinants of depressed mood (31,32). Empirical analyses have sometimes revealed a modest effect of age on mental health (33), sometimes no evidence at all that depression increases with age (34) and other times a strong association between age and the prevalence of symptoms of depression (35). In this paper, to account for a non-linear association between age and depression, age and age-squared were included in the empirical model (19). The positive association between social relationships and health has been stressed in many studies: a pioneering study considers social relationships as a risk factor for health, including in a wide notion of health, mental health too (36). Depression can be correlated with household size and income. Household size is a proxy for family support, which is expected to influence respondents' psychological well-being. It has been seen how the frequency of affective symptoms is lower in large families, although it is higher when the household income is low (37-39) 1 .
A number of studies also control for marital status. Marriage can act as a protective factor against the occurrence of depression (40,41). Depression is more common among nonmarried, single, divorced, and, especially, widows (33). A dummy variable was employed in the estimations by marital status, assuming a value of 1 if the respondent was married and 0 otherwise. To our knowledge, living in urban areas as source of depression has been neglected in the empirical literature on depression. In Korea, metropolitan areas are more densely populated than in any other countries.
To control for regional heterogeneity lifestyles and environmental conditions on depression, a regional dummy representing residents in metropolitan areas (including Seoul, Incheon and Gyeonggi-do) was included. In fact, these areas are more densely populated than in any other countries. As a quick example, 25,620,000 people, who constitute the 49.5% of Korean population, live in an area of 11,806 squared-km, which represents the 11.8% of the whole country 2 . As a result, while the largest towns might face higher congestion costs than those of rural areas, their residents have immediate access to higher quality services. Moreover, Korea also faces inequality issues, as productive activities are highly concentrated in metropolitan areas. Physical illness should also be considered (25), since depression often occurs with other comorbidity factors (42).
Poor physical health might lead to depression in later stages of life (43). Here, health status is assessed by the following question in the survey: "How is your health in general: would you say it is…?" The available answers, measured on a Likert scale, were: "very good"= 1, "good"= 2, "not bad"= 3, "bad"= 4 "very bad"= 5. Other indicators for people's health were the results of blood tests, blood pressure, and some individual habits, such as smoking.
Considering people's health among the regressors could raise problems of endogeneity. Depression might depend on the fact that individuals experience a poor level of health and vice versa (i.e. poor health is determined by a depressed mood that does not allow individuals to take care of themselves).
Other socio-economic variables that might be considered as determinants of depression are likely to be endogenous. High frequencies of mental disorders are connected with low education (44,45), family income, as it has been already stressed (18,46) and participation in the workforce (47), especially for women. To account for labor market status, a dummy variable assuming a value of 1 if the respondent was an employed woman and 0 otherwise was included as a regressor.
In order to address the endogeneity biases, two Instrumental Variables Probit (IVP) models were estimated, using as instruments: 1) the average income for workers and education level (college and high school) in the first model, that considered female workforce participation as an endogenous variable; and 2) the results of blood analysis (anemia, level of cholesterol, level of glucose), blood pressure and smoking in the second model, where the circumstance of reporting a good level of health was the variable suspected to be endogenous. The hypothesis tested concerns the existence of gender differences in reporting depressed mood; females are more likely to declare such symptoms, but family factors, and workforce participation should counteract this tendency. The paper is organized as follows: the next section, dedicated to methods, describes the IVP model, the data employed and the variables selected; results of the estimations are presented in the following section. Comments on the results and implications for health policies conclude the study.

The statistical model (IVP)
The first empirical model estimated in this study can be formalized as follows: y i = a i + bX' i + u i (1) Y is the dependent binary variable (depression yes/no); α is the constant term; β is a k × 1 vector, X is an n × k matrix of covariates; u is the error term. However, the probit model can be biased because of endogeneity. In this case, the correlation between the regressors and the error term is not zero (E (X, u) ≠ 0), so that the results of the estimation are inconsistent (48). One way to overcome this issue is to apply instrumental variables. The above model can be written in its reduced form: Here y* 1i is the dependent variable for the ith observation (i.e. the answer to the question, "Do you currently have chronic depression?") 3 . Depression, as proxied in this paper, may not represent the true value of such condition: in fact, a self-reported measure might pick up cultural differences in the description of subjective feelings and may be influenced by time and other factors occurring when the interview was done. However, self-report depression measures are becoming increasingly important (50); then, they can be easily implemented to large samples (51,52).
In (2a), y 2i is a vector of endogenous variables (in our case, the female workforce participation and the circumstance to report a good level for health); x 1i and x 2i are, respectively, a vector of exogenous regressors and variables used as "instruments"; β and γ are vectors of other structural parameters. In (2b), Π 1 and Π 2 are matrices of parameters. By assumption, (u i , v i ) ~ N (0, Σ). The estimation of the endogenous probit model can be done through a two-step procedure, such as the one proposed by Newey (53,54). Stata software package Version 10.0, which has been employed for the estimations, performs MLE (55). Regarding the selection of variables to use as suitable instruments, when panel data are available, a good option could be employing the first lag for all potentially endogenous variables, as lags can offer consistent estimators of the coefficient of interest (54). However, this is not the case here, as cross-sectional data are employed.

Sample selection
The Korea National Health and Nutrition Examination Survey (KNHANES) was employed in the analysis. The KNHANES is a population-based cross-sectional survey designed to assess the health-related behavior, health condition, and nutritional state of Koreans. Beginning with the forth wave in 2007, it was converted to an annual survey. The KNHANES provides a rich source of data and its participants are representative of the Korean population. In the last decade, the database has been increasingly employed in medical and epidemiological studies 4 . The data used in this study is from 2008 and information was gathered via face-to-face interviews. This analysis was restricted to respondents aged 19-75 years. Respondents younger than 19 years old or older than 75 years old were excluded as they are more likely to develop depression due to hormonal shifts in puberty and sickness at old age, so that the above circumstances may have biased the results. Hence, the final sample considered for this study included 6,751 respondents out of total of 9,744 observations.

Measures
Some information contained in the KNHANES, related to residence, work and people's health have been selected for the empirical analysis. The variables employed related to individuals' age and age squared, gender, marital status, number of family members, residence in metropolitan areas, female participation to workforce (a dummy variable assuming value= 1 if the respondent is a woman currently employed and= 0 otherwise), average income for workers and educational level. Variables related to health were high blood pressure (a dummy variable assuming value= 1 if people had systolic blood pressure higher than 140 and= 0 otherwise), the level of hemoglobin (anemia), the level of cholesterol in the blood (cholesterol), the level of glucose in the blood (glucose), the circumstance to be a smoker. We did run two estimations of the IVP model. In the first one, we considered the effect of socio-economic variables, such as the circumstance of living in metropolitan areas, the average income of workers, and the level of education, which is likely to determine women's participation in the workforce. In the second estimation, attention was paid to a satisfactory level of health (a dichotomous variable assuming a value of 1 if the reported health status on a Likert scale was ≥3 on a scale 0-5 and 0 otherwise). "Good health" was measured by considering clinical parameters as the level of blood pressure, the presence of anemia, the levels of cholesterol and glucose in blood tests, and the circumstance of being a smoker. After the estimation, Wald tests for exogeneity were conducted in order to control if IVP regression might be a suitable approach. If it is not possible to reject the null hypothesis of exogeneity, there is no need for an IVP approach and estimates of a probit model are more efficient. Table 1 and 2 display descriptive statistics of the all variables included in the empirical analysis. Forty-three percent of people in the sample were male; more than eighty-five percent were married or lived with someone, while 61.2% were currently employed. There are gender differences in the level of salary (106.1 (10,000 KRW) for male workers vs. 78.2 (10,000 KRW) for female workers as average monthly salary). Women's workforce participation is also limited (28.4% of women were included in the sample). About 34% of people in the sample completed college education. More than 42.0% live in metropolitan areas: Seoul and Gyeonggi-do are the most populated regions. More than 75.0% of people declared to be in good health; almost 15.0% of the sample population answered positively when asked about depressed mood. As expected, a higher percentage of women reported depression compared to men, although this measure is not a clinical diagnosis of depression but rather a subjective one 5 . The results of the IVP estimation after controlling for the potential endogeneity of work female are reported in Table 3. Marginal effects are reported in Table 4. The estimated probability of reporting depression is 19.7%. It is possible to see a huge difference across gender: the estimated probability of women reporting depression is 37.2% higher than for males. In a second equation, the focus was on variables related to  health. "Good health", that assumes a value of 1 if a score higher than 3 about health status is reported and 0 otherwise, is the variable for which an instrumental equation has been estimated. Instruments are related to individuals' habits (being a smoker or not) and clinical conditions (blood pressure, anemia, cholesterol and glucose).

Results
Results of the estimations are reported in Table 5. The estimated coefficients are smaller, as for the number of family members, while variables related to education are not significantly estimated. As a robustness check, we tested the exogeneity of the variables included in our empirical specification. For that purpose, an instrumental variables approach was carried out.
Since the results of the Wald test do not allow to reject the null hypothesis, a probit model was run, performing the link test for model specification after the estimation. Table 6 displays the marginal effects of the probit model. Coefficients estimated for the probit model show the same signs and significance of the IVP.
Again, higher education and college education are significantly and inversely correlated with the dependent variable and show how the probability of reporting depressed mood is of -24.6% for people who completed high school and of -3.1% for those ones graduated at college. The probability to report depression diminishes to 12.4% when good health (score >3 on a Likert scale) is overall reported.

Discussion
The empirical results are in line with previous empirical literature on the determinants of self-reported depression.
Overall, women tend to experience depression more often than men. Being married and socio-economic factors were found to be protective factors against depression. We found a nonlinear relationship between age and   IVP= Instrumental Variable Probit. *** significant at 99%; ** significant at 95%; * significant at 90%.   depression. In particular, the estimated coefficient for age is positively and significantly correlated with depression, while a negative and significant correlation exists with the squared term, though the estimated coefficient is indeed small and shows that aging has little effect on the probability of reporting depression. Surprisingly, living in a metropolitan area reduces the risk of experiencing depression. A potential explanation for this pattern is that living in big cities might offer better job opportunities and career advance. Indeed, women might be more vulnerable in a competitive environment than their male counterparts. However, it has been recently shown in some studies that quality of life may be higher in urban areas and that it might be more difficult to recover from depression in rural areas, especially for elderly women (57). Although urban life in Korea can be very stressful due to the working conditions (hierarchical society and poor working conditions for non-regular workers) and the high cost of living, there are more resources to explore in urban areas. In addition, some studies have outlined how chronic medical conditions are more prevalent in rural areas -a circumstance that may also affect the overall quality of life (58,59).

IVP coefficients (standard errors in brackets)
The risk of depression decreases as family size increases. This result confirms the findings of the literature on self-reported happiness: it has been demonstrated that people living with their grandchildren enjoy a higher quality of life (i.e. the larger the number of family members, the higher the level of happiness) (60). Conversely, it should be possible to argue that the more people living in the same family, the lower the risk of depression. Workforce participation, especially for women, can be influenced by several factors, mainly related to family. In the empirical estimation, we included the average income for workers that could be a potential factor that might induce a higher participation by female workers, in those cases when family income is so low that another source of income for the family is necessary. However, the increase in women workforce participation might determine, in turn, higher difficulties in managing women roles within the family and in the working environment, increasing the risk of depression As expected, there is a negative impact of good health on the probability of reporting depression: in fact, a level of health higher than 3 over 5 on a Likert scale is inversely related with the probability of experiencing depression. In general, this study confirms the main results of the literature: depression is significantly associated with socioeconomic conditions, such as educational attainment, work, and income. The present analysis could be refined by trying to overcome potential bias: here, depression was detected by the answers given to the question, "Do you currently have chronic depression?" A more detailed investigation for a selected sample might aim at collecting information on the level of depression through validated questionnaires. This study sheds light on the frequency of depressed mood in the population considered. It can be a good starting point for further studies aimed at analyzing, for instance, the economic impact of depression and the possible effects of counseling and support programs directed especially to the subjects more likely to report a depressed mood in the population, as women, to help them to contrast depressive symptoms. In this perspective, there are some studies, albeit only from high-income country's contexts, indicating cost-effective approaches for the prevention of depression across the life course (60) or estimating the effect of preventive programs for mothers in the first two years after birth (61). The positive consequences of these programs could be evaluated both in terms of economic benefits, such as the increased productivity and the diminished consumption of drugs and antidepressants, and, more importantly, in terms of health outcomes, such as a higher quality of life.

Ethical Issues
This work was supported by a research grant from Seoul Women's University (2015) and also approved by the ethics committee of Seoul Women's University, Seoul, South Korea.
1. In the absence of any measurement of relationship between each individual and other household members, the ages of other family members, or the type and quality of the relationship between the individual and his/her family members, the household size might not be the best proxy for family support. A household of two adults and five pre-adolescent children would not be a support system that buffers against depression the same way that a household with four adults and three pre-adolescent children would, as, in the first case, the large number of children in the household may be a risk factor for depression. Information about family members' age cannot be obtained from the database employed and, to our knowledge, no previous studies on this topic have specifically considered each family member' age. 5. Looking at the KNHANES database, we noticed how another variable that could be considered as a proxy to state the presence of a depressed mood is the feeling of "uneasiness", which represents the difficulties faced by individuals in coping with daily life. The respondents answered to the question, "Are you having difficulty in coping with daily social life due to physical and/or mental problems?" (Yes= 1, No= 0). Overall, the feeling of uneasiness was reported by 18.1% of respondents.