The relation of gender role attitudes with depression and generalised anxiety disorder in two Russian cities

Background: Reported traditional gender role attitudes (GRAs) have been related to worse mental health in western countries. This study examined the link of GRAs with symptoms of depression and generalised anxiety disorder (GAD) in two Russian cities. Methods: We used interview data from the cross-sectional Know Your Heart Study conducted among 5099 adults aged 35-69 in the Russian cities of Arkhangelsk and Novosibirsk between 2015 and 2017. Attitudes about gender inequality and division of labour between women and men at home or in the public sphere were measured by single items. Binary variables indicating presence of symptoms of depression and GAD were defined by a cut-off of ≥ 5 of the PHQ-9 and GAD-7 scores respectively. Multivariable logistic regression was used to determine crude and adjusted associations. Results: There was evidence that all types of GRAs were associated with symptoms of depression and GAD consistent with a U-shape after controlling for confounding with stronger evidence for all relationships for depression than for GAD. Odds of depressive symptoms were elevated among participants strongly agreeing to gender inequality and gender division of labour. There was good evidence for effect measure modification by


Introduction
Globally, the burden of years lived with disability (YLD) of depression and anxiety disorders has been increasing since 2005, with depression ranking 3rd and anxiety disorders 9th of most common causes of YLD in 2015 (Vos et al., 2016). In Russia, a cross-sectional study from 2000 using the Centre for Epidemiologic Studies Depression Scale (CES-D) showed a point prevalence of depressive symptoms of 44% among women and 23% among men aged 45-64 living in Novosibirsk, which was comparable to levels found internationally in 2000 (Bobak et al., 2006). In a cross-sectional study conducted in Arkhangelsk among adults aged 18-90 in 2000, point prevalence of symptoms of depression was estimated at 34% among women and 11% among men, while symptoms of anxiety were estimated at 53% among women and 21% among men (Averina et al., 2005). Self-harm was the third most common cause of years of life lost in Russia in 2016 (Starodubov et al., 2018). Depressive and anxiety disorders can be classified as common mental disorders (CMDs), sharing some risk factors and treatment approaches (National Collaborating Centre for Mental Health (UK), 2011).
Gender is a "social construct regarding culture-bound conventions, roles, and behaviours for, as well as relations between and among, women and men and boys and girls." (Krieger, 2003). Gender roles are behavioural norms applied to women and men for example in the spheres of the family, the labour force or education (Cuff and Payne, 1979;Tannenbaum et al., 2016). These norms ascribe for example childcare and domestic work to women, while family leadership and breadwinning are ascribed to men (Parsons and Bales, 1955;Risman and Davis, 2013). Norms about gender roles are assumed to be socially produced for example through socialisation and education of girls and boys during early childhood (Heise et al., 2019).
Gender role attitudes (GRAs) have been frequently assessed on the individual level in sociological surveys by asking for beliefs regarding appropriate behaviour for women and men (Halman et al., 2011). Theoretical considerations suggested that the construct of GRAs is multidimensional (Larsen and Long, 1988). In order to assess content validity of measures of GRAs, Constantin and Voicu developed a theoretical framework that attempts to cover all dimensions (Constantin and Voicu, 2015). According to their framework, which is mainly based on the typology by Jelen, GRAs include attitudes towards two dimensions of power balance (inequality and complementarity) in two broad social contexts (private and public) (Jelen, 1988). The first dimension of power balance, gender inequality, refers to the superiority of a sex, typically the superiority of men over women. The second dimension of power balance, gender complementarity, describes the gendered division of labour in performing tasks such as domestic work or childcare without necessarily assuming inequality. Finally, the framework distinguishes GRAs about roles of the private and the public sphere, whereby the private sphere covers roles within the home and the public sphere all roles outside (Constantin and Voicu, 2015).
The combination of attitudes towards gender inequality and complementarity theoretically yield four types of GRAs (Constantin and Voicu, 2015). First, traditional GRAs assume both inequality and complementarity. Specialised GRAs reject inequality but accept complementarity, while modern GRAs reject both inequality and complementarity. According to Jelen, there are chauvinist GRAs, which assume inequality but reject complementarity (Jelen, 1988). However, Constantin and Voicu consider the latter type as irrelevant in practice (Constantin and Voicu, 2015). Adding the dimension of context yields eight types of GRAs, namely traditional, specialised, modern and chauvinist GRAs of the public and private sphere. In contrast to GRAs, we speak of performed gender roles when we are referring to performed practices such as childcare or employment.
In Russia, gender roles and norms about superiority of men and division of labour between the sexes are a prevailing feature of society. During the regime of the Soviet Union, superiority of males and the division of labour between women and men continued to be institutionalized, despite an official rhetoric of gender equality and high proportions of women in leading positions (Connell, 1987). In contemporary Russia, these gender roles in society are being reinforced further (Motiejunaite and Kravchenko, 2008). Discrimination at work against women and propagation of the female role as caretaker of the family and mother who should remain at home is increasing since the fall of the Iron Curtain (Dawn Metcalfe and Afanassieva, 2005;Shiraev, 1999). Nowadays, even in the young generation, there seems to be almost no incline towards assuming domestic work among men (Ashwin and Lytkina, 2004). However, for example female students in Russia are questioning prevailing norms about gender roles and demand gender equality in private and public life (White, 2005).
It has been hypothesised that gender inequality and division of household labour act as psychosocial stressors which might facilitate the development of depression and anxiety disorders (Rosenfield and Mouzon, 2013). A review summarising evidence on the relationship of performed gender roles and health found that there is a large body of evidence indicating that psychological distress is consistently lower among women in employment compared to women assuming the role of caregivers of the family (Mayor, 2015). This evidence has been generated mainly among populations in industrialised countries (Mayor, 2015). Little evidence is available from low-and middle-income countries. A study from Ghana found that stress that was perceived to be due to sociocultural prescriptions of gender role norms was associated with anxiety and depressive symptoms among girls but not boys (Anyan and Hjemdal, 2018).
To our knowledge, there is no review about the link of GRAs with CMDs. Some findings are available from cross-sectional surveys. In the Netherlands, the UK and the US, agreement to the superiority of men and the division of labour in the family were associated with worse selfrated well-being, psychological distress, symptoms of depression and suicidal thoughts among both women and men (Arends-Tóth and van de Vijver, 2007;Glass and Fujimoto, 1994;Hunt et al., 2006;Sweeting et al., 2014). A study from the UK found an association of agreement to the superiority of men and division of labour in the family with worse self-rated mental health among women but not men, while studies from Sweden found an association with worse self-rated mental health in studies that included women only (Eek and Axmon, 2015;Read and Grundy, 2011;Staland-Nyman et al., 2008). Research from Russia using data from the Russian Longitudinal Monitoring Survey (RLMS) suggests that agreement to the division of labour in the family is associated with less perceived control as a proxy for mental health (Barrett and Buckley, 2009). Finally, it has been suggested that the association of GRAs with psychological distress differs according to age group (Sweeting et al., 2014).
Prospective studies on the relationship of GRAs with CMDs are not available, however, a comprehensive longitudinal study from Sweden investigated the link of perceived gender inequality of the couple relationship and actual distribution of domestic work with incident symptoms of depression. Results indicate that perceived gender inequality of the couple relationship is associated with developing symptoms of depression among women, while there was no association of an unequal distribution of domestic work with incident symptoms of depression (Hammarström and Phillips, 2012). In the same cohort, an equal distribution of domestic work was related to lower psychological distress in the subgroup of cohabiting women (Harryson et al., 2012). Among cohabiting study participants, perceived gender equality of the couple relationship was associated with fewer symptoms of depression among both women and men (Harryson et al., 2012).
Prevailing norms about inequitable gender roles in Russia and evidence suggesting that GRAs are related with CMDs raises questions about the relationship between GRAs and CMDs in the Russian context. This study aims at exploring the relation of individual GRAs with prevalent symptoms of depression and generalised anxiety disorder (GAD) in two Russian cities and whether this relationship differs by sex and age.

Methods
Our project was conducted using data of the "Know Your Heart" (KYH) study, a cross-sectional survey composed of an interview and a health check-up. KYH was part of the International Project on Cardiovascular Disease in Russia (IPCDR). The rationale of KYH was to determine sociodemographic characteristics, health system use as well as risk factors and biomarkers of the cardiovascular system to compare the current cardiovascular phenotype in Russia to Norway and elucidate driving factors of high cardiovascular mortality in Russia (Cook et al., 2018).
The target population of KYH were inhabitants of both sexes of the Russian cities of Arkhangelsk and Novosibirsk aged 35 to 69 years. A sampling frame containing information about sex and age of occupants at individual addresses was provided by regional health insurance funds. A random sample of individuals stratified by 5-year age group and sex was drawn from this sampling frame. Only information on age and sex of each individual were provided, while names were not revealed due to data protection regulations. A professional surveying company visited the homes asking to interview the randomly selected person of the respective sex and age. Surveying started in November 2015 and ended in December 2017. Personal interviews with the single individual were conducted at the participant's home including questions about GRAs, mental health and sociodemographic characteristics. All interviewers were female. A computer-assisted personal interviewing (CAPI) device implemented on tablets supported data collection. Data quality of key variables was monitored monthly. Details of the survey design and response rates are given elsewhere (Cook et al., 2018).
Prevalent symptoms of depression in the past two weeks were measured using the Patient Health Questionnaire depression measure using 9 items (PHQ-9) (Kroenke et al., 2001). In this analysis, a binary variable indicating presence of any depressive symptoms (mild, moderate or severe) was created using a cut-off of the PHQ-9 score of ≥ 5 as a score of 5 is the lower boundary for indicating mild symptoms of depression (Kroenke et al., 2001). Symptoms of GAD in the past two weeks were measured with the 7-item Generalised Anxiety Disorder (GAD-7) scale (Spitzer et al., 2006). Similarly, we used a cut-off of the GAD-7 score of ≥ 5 to create a binary variable indicating presence of any symptoms of GAD (mild, moderate or severe) (Spitzer et al., 2006). Assessing the outcomes on a continuous scale was not possible because the distribution of the PHQ-9 and GAD-7 scores was strongly lognormal with an excess of zeros.
Four items about individual GRAs were used in KYH. These items have been included because their use has been established in the RLMS, a large national economic and health survey in Russia (Kosolapov et al., 2002). The translated wording of the each question was: Item 1: "To what extent do you agree or disagree with each of the following statements? It's a husband's responsibility to earn money and a wife's responsibility to take care of the house and children." (division of labour in the family).
Item 2: "To what extent do you agree or disagree with each of the following statements? It's more suitable for a man than for a woman to be a leader or manager" (men's superiority in the labour market).
Item 3: "To what extent do you agree or disagree with each of the following statements? Men and women should play equally important parts in politics." (unequal gender roles in politics).
Item 4: "In our time various opinions are given on who should be the head of the family. Which statement is closest to your opinion on this subject?" (family leadership).
Answers to items 1-3 were measured on an ordinal Likert scale with five response options from "absolute agreement" to "absolute disagreement". As item 3 asked for equally important roles of women and men in politics, the valuation of the responses were reversed. Disagreement to this question was valued as agreement to unequal gender roles in politics. Item 4 was a categorical variable on a nominal scale. The three response options to item 4 were: a "The husband should be responsible for the family, the head of the family, and the wife should be obedient to her husband" (Men superior). b "The husband and the wife should have equal right and be equally responsible for the family" (Both are equal). c "The wife should be responsible for the family, the head of the family" (Women superior).
For all variables, the middle option ("both yes and no", or "both are equal") was used as the reference category, because it was considered the most neutral answer.
To assess whether the items can be used to measure a single underlying psychosocial construct we calculated pairwise Spearman correlations and Crohnbach's α as a measure of internal consistency (Cronbach, 1951). All correlations were very poor, with the highest value of 0.22 for the correlation of items 1 and 2. Furthermore, we found a Crohnbach's α of 0.37 for the summary score of all items. These results indicated absence of one common underlying construct. Hence, no data reduction methods combining the four items were applied. This is in line with theory suggesting that the items mirror distinct types of GRAs. According to Constantin and Voicu, item 1 measures specialised GRAs of the private sphere while items 2 and 3 measure traditional GRAs of the public sphere (Constantin and Voicu, 2015). Item 4 was assumed to represent traditional GRAs of the private sphere, because it asked for superiority of either the husband or the wife within the family. As a corollary, each item was regarded as separate exposure in the analyses.
A conceptual framework of the exposure-outcome relationship under study was defined prior to analysis. Presence of any symptoms of depression or GAD in the last two weeks were the outcomes of interest. Confounders were selected with the rationale that reverse causality should be unlikely. Confounders were considered not to be subject to reverse causality if they were unlikely to be caused by GRAs and/or common mental disorders. A priori selected confounders were study site, sex, age and nationality to represent sociodemographic features. Educational level was selected as a further confounder to represent socioeconomic position (SEP) in early adulthood. Educational level was considered unlikely to be affected by reverse causality of mental health among people above the age of 35. All confounders were included in model 1. We assumed that performed gender roles such as being in employment, cohabitation and parenthood might be located on the causal pathway from GRAs to mental health. Hence, we adjusted for performed gender roles in a second model in order to obtain an effect estimate for GRAs independent from an effect of performed gender roles. Factors such as chronic illness or psychiatric medication were not adjusted for because we assumed that these factors might be mediators of a possible effect of GRAs or performed gender roles on prevalent symptoms of CMDs.
The variable education was categorised into three groups (incomplete secondary, complete secondary and higher than secondary education). Employment status was grouped into being in regular paid employment or not. Both groups of employment status included some participants who also reported that they were retired. This is in keeping with the situation in Russia where it is not unusual to retire and receive a pension but also to continue in some form of paid employment. The proportion of people in retirement was much higher among those not in regular paid employment compared to those in regular paid employment (82% vs. 39%). People who reported living with a partner either in marriage or outside marriage were considered to be cohabiting. Participants who did not live with a partner at the time of the interview were considered not cohabiting. This group was formed of those who gave their marital status as not cohabiting and divorced/widowed/ never married. Parenthood was assumed if participants reported having one or more children.
Separate logistic regression models were fitted to assess evidence for an association of the four exposures with each outcome adjusting for confounders. Model 1 adjusted for study site, age group, sex, nationality, and level of education. Model 2 adjusted for each variable in model 1 and employment status, cohabitation and parenthood. All models were run on the same set of complete observations. For Likertscaled variables, the category indicating indifference was used as reference. To assess evidence for associations, likelihood-ratio tests were performed. P-values were interpreted as strength of evidence for an association, without using a cut-off indicating significance. Very low values were interpreted as good evidence, while p-values close to 0.05 were interpreted as weak evidence for an association. Finally, Likertscaled variables were investigated for linear association of increasing disagreement to GRAs. Wald-tests were used to assess evidence for a linear trend and likelihood-ratio tests were used to calculate p-values for departure from linearity ("p (dl)").
The two a priori specified effect modifiers sex and age were investigated using logistic regression. Age groups were regrouped into the binary age groups 35-54 years and 55-69 years to increase power. All models investigating effect measure modification (interaction) were adjusted for variables of model 1. Stratum-specific odds ratios were calculated as well as p-values for interaction from likelihood-ratio tests.
To assess the impact of changes in the cut-off of the measurement of symptoms of depression and GAD, all analyses of exposure-outcome associations as stated above were repeated in a sensitivity analysis using different cut-off values. To measure moderate or severe depressive symptoms and symptoms of GAD, we used the cut-offs ≥ 10 of the PHQ-9 and ≥ 10 of the GAD-7 (Kroenke et al., 2001;Spitzer et al., 2006).

Description of the sample
A description of the study population is given in Table 1. The total sample included 5099 participants, of whom 2474 were recruited in Arkhangelsk and 2625 in Novosibirsk. 57% of the study population was female, 5% had a nationality other than Russian, and the educational level of 40% exceeded secondary education. Furthermore, 41% were not in regular paid employment, 31% were not cohabiting and 10% had no children.
Depressive symptoms (PHQ-9 ≥ 5) within the past two weeks were most prevalent in the study population (35%), followed by symptoms of GAD (GAD-7 ≥ 5) within the past two weeks (22%). There was some evidence that the odds of depressive symptoms were higher in Novosibirsk, among women, people with incomplete secondary education, not in regular paid work, and not living with a partner after adjusting for age group and sex (Table 1). The adjusted odds of symptoms of GAD were higher in Novosibirsk, among women, participants without regular paid work, and not living with a partner.

Association of GRAs with depressive symptoms
The crude odds for depressive symptoms were higher among participants strongly agreeing and strongly disagreeing to the division of labour in the family, to men's superiority in the labour market and to unequal gender roles in politics (Table 2). There was evidence for each association after adjusting for age group, sex, nationality, education and study site although except for the "absolutely agree" groups confidence intervals for odds ratios crossed 1 (model 1). These associations did not substantively change after additionally adjusting for aOR: odds-ratio adjusted for age group and sex (association of age group adjusted for sex, association of sex adjusted by age group), 95% CI: 95% confidence interval. ⁎ p-value from likelihood-ratio test for association.

Table 2
Crude and adjusted association of four measures of gender role attitudes (GRAs) with symptoms of depression (PHQ-9 score ≥ 5). OR: odds-ratio (aOR: adjusted odds-ratio). 95% CI: 95% confidence interval. employment, cohabitation status and parenthood (model 2). There was evidence that attitudes about family leadership were associated with depressive symptoms. Odds for depressive symptoms were elevated among participants believing that men should be superior to women in a family (OR 1.19 (95% CI 1.00 -1.42)) compared to participants believing that women and men should be equal in model 1. However, this association was not seen in the crude model and after further adjustment (model 2) confidence intervals crossed 1 (OR 1.16 (95% CI 0.97 -1.39)). Finally, odds for depressive symptoms were elevated among participants believing that women should be the head of the family in all models (model 2 OR 1.67 (95% CI 1.10 -2.52)). There was good evidence for departure from linearity of the unadjusted associations of attitudes about the division of labour in the family (p (dl) = 0.02) attitudes about men's superiority in the labour market (p (dl) < 0.001) and attitudes about unequal gender roles in politics (p (dl) = 0.02). After adjusting for variables of model 1, evidence for departure from linearity was also found for attitudes about the division of labour in the family (p (dl) = 0.03) attitudes about men's superiority in the labour market (p (dl) < 0.001) and attitudes about unequal gender roles in politics (p (dl) = 0.03). P-values for departure from linearity in model 2 remained unchanged compared to model 1. Instead of a linear relationship, the associations with the exposure variables were U-shaped, where odds of depressive symptoms were highest among participants strongly agreeing and strongly disagreeing.

Association of GRAs with symptoms of GAD
Before and after adjusting for confounding, there was evidence for an association of attitudes about the division of labour in the family and of attitudes about men's superiority in the labour market with symptoms of GAD (Table 3). Odds for symptoms of GAD were elevated among both participants absolutely agreeing and absolutely disagreeing to these types of GRAs although confidence intervals for all ORs crossed 1. There was no evidence for an association of attitudes about unequal gender roles in politics with symptoms of GAD before and after adjusting for confounding. Considering attitudes about family leadership, there was no evidence for a crude association with symptoms of GAD. After adjusting for variables of model 1, there was evidence for an association. Odds were elevated among participants judging men superior in the family (model 2 OR 1.21 (95% CI 0.99 -1.48)) and among participants judging women superior in the family (model 2 OR 1.45 (95% CI 0.92 -2.31)) although confidence intervals crossed 1. There was weak evidence for a similar association of attitudes about family leadership with symptoms of GAD after adjusting for variables of model 2.
Furthermore, we found evidence for departure from linearity of the association of attitudes about men's superiority in the labour market with symptoms of GAD (p (dl) = 0.01). This association was also Ushaped, odds for symptoms of GAD were highest among participants strongly agreeing and strongly disagreeing to this item. We did not find evidence for departure from linearity of the associations of all other types of GRAs with symptoms of GAD.

Effect measure modification of the association of GRAs with symptoms of depression by sex and age
There was weak evidence for effect measure modification (interaction) by age of the association of attitudes about unequal gender roles in politics and attitudes about family leadership with depressive symptoms. We found somewhat more pronounced associations among participants under the age of 55 compared to participants over the age of 55 (Table 4). We did not find evidence for effect measure modification by sex of the association of GRAs with depressive symptoms.

Effect measure modification of the association of GRAs with symptoms of GAD by sex and age
Likewise, we found weak evidence for effect measure modification by age of the association of attitudes about unequal gender roles in politics and attitudes about family leadership with symptoms of GAD (Table 5). Effect measure modification of the association of attitudes about unequal gender roles in politics with symptoms of GAD was characterised by higher ORs among older participants disagreeing to this item. Furthermore, there was a stronger association of attitudes about family leadership with symptoms of GAD among participants under the age of 55. We did not find evidence for effect measure modification by sex of the relationship of any GRAs with symptoms of GAD.

Sensitivity analysis
In the sensitivity analysis, there was weak evidence that attitudes about men's superiority in the labour market and attitudes about unequal gender roles in politics were associated with moderate or severe depressive symptoms (PHQ-9 score ≥ 10) (online supplement). Odds of moderate or severe depressive symptoms were elevated among participants both strongly agreeing and strongly disagreeing to these types of GRAs. Furthermore, there was evidence that the odds of moderate or severe depressive symptoms were elevated among participants agreeing that women should be the leader of the family. There was no evidence for an association of attitudes about the division of labour in the family with moderate or severe depressive symptoms. Finally, there was no evidence for an association of any type of GRAs with moderate or severe symptoms of GAD (GAD-7 score ≥ 10) (online supplement).

Discussion
All types of measured GRAs (traditional and specialised GRAs of the private sphere and traditional GRAs of the public sphere) were associated with any depressive symptoms in the past two weeks after adjusting for confounding with a pattern of results suggesting a U-shaped association (higher odds of depressive symptoms in those both strongly agreeing and disagreeing with statements about GRAs). There was evidence that participants strongly agreeing to unequal gender roles in politics and assuming that women should be head of the family were at higher odds of any depressive symptoms when they were under the age of 55 compared to participants over 55. Taking confounding into account, there was evidence that strong agreement to the division of labour in the family, to men's superiority in the labour market, and to attitudes towards family leadership were weakly associated with any symptoms of GAD. Associations with symptoms of GAD were also Ushaped. While results for depressive symptoms were similar when choosing a PHQ-9 score ≥ 10 as cut-off, no evidence for an association of GRAs with GAD was found when choosing a cut-off of the GAD-7 score ≥ 10.
Findings from this study are in line with studies showing that strong agreement to gender inequality and division of labour in the private sphere are associated with symptoms of depression and GAD among both women and men (Arends-Tóth and van de Vijver, 2007;Glass and Fujimoto, 1994;Harryson et al., 2012;Hunt et al., 2006;Sweeting et al., 2014). Our results agree with results from a study using data of the RLMS, which suggests that an association of agreement to division of labour in the family with psychosocial control exists in a Russian context (Barrett and Buckley, 2009). To our knowledge, no previous study has described a U-shaped association of GRAs with CMDs. Furthermore, our results provide evidence that there is an association of GRAs of the public sphere with CMDs.
However, the U-shaped association of GRAs with CMDs is difficult to interpret. Other social constructs have also shown a U-shaped association with CMDs. Some studies have described such an association for Table 3 Crude and adjusted association of four measures of gender role attitudes (GRAs) with symptoms of generalised anxiety disorder (GAD-7 ≥ 5). OR: odds-ratio (aOR: adjusted odds-ratio). 95% CI: 95% confidence interval.
adherence to traditional religious life (Braam et al., 1999;King et al., 2007;Wei and Liu, 2013). Further exposures were subjective social status, workplace social capital and work engagement (Aslund et al., 2009;Imamura et al., 2016;Sakuraya et al., 2017). A possible explanation for the U-shape might be that the support of gender inequality and complementarity is deeply embedded in Russian society and deviation from the norm might increase psychosocial stress in addition to stress that might be due to acceptance of inequality and complementarity (Ashwin and Lytkina, 2004;Janey et al., 2005;White, 2005).
Effect measure modification by age might be due to the importance of gender roles in early adulthood, when childcare is a central part of the lives of many couples. Perceived gender inequality among couples caring for children has been associated with psychological distress among women and men in a cohort study from Sweden (Harryson et al., 2012). However, a study in the UK found stronger associations among older adults, which contradicts our finding of a stronger association among younger adults (Sweeting et al., 2014). Effect measure modification by age might furthermore be due to generational differences in Russia. Participants aged less than 55 were under the age of 30 at time of collapse of the USSR. For example, gender roles could have had a different meaning in a generation experiencing social and political change during adolescence or early adulthood from those who were older at the time of transition. Finally, it is surprising that we did not find effect measure modification by sex as differential effects have been suggested by previous research (Hammarström and Phillips, 2012;Mayor, 2015;Sweeting et al., 2014).
An important limitation of this study was the possibility of reverse causality of CMDs on GRAs due to the cross-sectional study design. A review showed that gender specific behaviour as well as the adoption of gender stereotypes start to develop at the ages of 2-3 years (Martin and Ruble, 2010). Less solid evidence is available about long-term stability of gender role perceptions. Some prospective studies have found stability of perceptions about gender roles during adulthood (Cast, 1997;Kirchmeyer, 2002;Lucier-Greer et al., 2012;Marini, 2000). Furthermore, GRAs about roles within the home (private sphere) seem to be more stable over time than GRAs about roles in the public sphere (Garovich and Lueptow, 1995;White, 2005). Despite the hypothesis of long-term stability of GRAs, the results of our study do not allow conclusions on a causal impact of GRAs on symptoms of depression and GAD. In addition, depressive symptoms could have been present in child-and adulthood already influencing the development of GRAs at this early stage (Holzel et al., 2011).
Depression is furthermore known to alter cognition, which might have affected answering patterns if depressed participants tended to answer in a more extreme fashion (McCarty et al., 2007;Nolen-Hoeksema et al., 1992). Participants of KYH showed higher odds of depressive symptoms when agreeing to the extreme end of Likert-scaled variables indicating low self-perceived social trust and health efficacy (data not shown). However, no elevated odds of depressive symptoms were observed among participants agreeing to the other extremes of high self-perceived social trust and health efficacy. This observation suggests that depressed participants might have agreed to questions expressing a negative outlook. However, GRAs are unlikely to be related to a negative outlook per se. In conclusion, we consider the Ushaped association of GRAs with symptoms of CMDs unlikely to be related to answering patterns that were due to a depressive mood.
The PHQ-9 is a validated tool to detect major depressive disorder, where a score ≥ 10 showed optimal discriminatory ability with a sensitivity of 80% and a specificity of 92% (Gilbody et al., 2007;Kroenke et al., 2001). However, reporting mild, moderate or severe symptoms of depression was considered the more relevant outcome in this study. As the target population of this study was the general population, mild depressive symptoms might be responsible for a high burden of disability of a population (Vos et al., 2016). In addition, mild depressive symptoms are clinically important and can require treatment (Hegerl et al., 2012). The same applies to the GAD-7 scale, which has been validated to screen for GAD in a clinical and research setting (Spitzer et al., 2006). Sensitivity analyses using a higher cut-off of the PHQ-9 and GAD-7 showed a similar U-shaped association with GRAs, which supports robustness of our findings to the choice of cut-off values.
Concerning measurement of the exposure, reporting bias could have been introduced. Men might have given socially desirable answers because all interviewers were female. Applying data reduction methods might have reduced error of the measurement of GRAs. However, poor aOR: odds-ratio adjusted for study site, age group, sex, nationality and education. 95% CI: 95% confidence interval.
⁎ p-value for interaction from likelihood-ratio test. aOR: odds-ratio adjusted for study site, age group, sex, nationality and education. 95% CI: 95% confidence interval.
⁎ p-value for interaction from likelihood-ratio test.
correlations and poor internal consistency of the items indicated that they measured different dimensions of GRAs. Furthermore, despite the applied items being part of the RLMS, the World Values Survey and the International Social Survey Program, a validated score of a measure of GRAs is not available (Constantin and Voicu, 2015). In addition, the applied items do not cover all theoretical types of GRAs of the private and public spheres, limiting content validity (Constantin and Voicu, 2015).
Theorists have pointed out that the concept of gender roles itself should be questioned. Gender roles might imply that accepting a role is an individual choice, hence, this theoretical approach neglects structural aspects as well as power relations and enhances a binary framework of gender (Connell, 1987(Connell, , 2012. Advancing the translation of sociological theory of gender such as relational theory into social epidemiology is needed to develop models on possible links of gender with health that comply with gender theory (Connell, 2012).
There are several pathways which might lead from GRAs to CMDs. Accepting gender inequality and the division of housework is associated with women performing household chores alone, especially if their education is lower than the educational level of their spouses (Carriero and Todesco, 2018). Furthermore, accepting gender inequality and complementarity might serve to justify discrimination at the workplace and intimate partner violence (Flood and Pease, 2009;Tran et al., 2016;Verniers and Vala, 2018). These factors might cause chronic exposure to psychosocial stress which might lead to the manifestation of CMDs (Rosenfield and Mouzon, 2013). However eliciting the pathways between GRAs and CMDs is beyond the scope of the crosssectional analyses presented here and further research is needed to test these hypotheses.
To conclude, our research is in line with recent studies suggesting a relationship of GRAs with prevalent symptoms of CMDs. Despite methodological limitations of our study, we argue that a call for a more thorough consideration of gender inequality and complementarity as social determinants of CMDs is justified (Hammarström et al., 2009;Hammarström and Phillips, 2012;Neitzke, 2016). More research on gender inequality and complementarity might facilitate the development of new explanatory models of CMDs (Hammarström et al., 2009). Recently, scholars called for interventions on restrictive gender norms to improve a variety of health outcomes (Gupta et al., 2019). Gender norms might be amenable to change, however, evidence for causality needs to be established before interventions can be put into practice.
Further research to unpack this relationship should use qualitative and quantitative methods to study possible pathways of embodiment from GRAs to mental health. Prospective studies are needed to revoke the possibility of reverse causality. Thereby, sound theoretical conceptualisation and validated measurement tools of relevant aspects of gender inequality and complementarity on micro and macro levels should be applied to strengthen evidence and to enable consistent comparisons of research findings. Much research to date originates from secondary data analyses, hence, studies investigating this association should be conceptualised at the design stage. Finally, further evidence from non-western settings is needed to gather information on relevance of context.

Funding
The International Project on Cardiovascular Disease in Russia (IPCDR) project was supported in part by a Wellcome Trust Strategic Award [100217]. The project was also funded by the UiT, Arctic University of Norway in Tromsø; Norwegian Institute of Public Health; the Norwegian Ministry of Health and Social Affairs. PJ received a scholarship from the German Academic Exchange Service.

Role of the funding sources
Funders were not involved in in study design; in the collection, analysis and interpretation of data; in the writing of the report; or in the decision to submit the article for publication.

Declarations of Competing Interest
We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.
We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all of us.
We confirm that we have given due consideration to the protection of intellectual property associated with this work and that there are no impediments to publication, including the timing of publication, with respect to intellectual property. In so doing we confirm that we have followed the regulations of our institutions concerning intellectual property.
We further confirm that any aspect of the work covered in this manuscript that has involved either experimental animals or human patients has been conducted with the ethical approval of all relevant bodies and that such approvals are acknowledged within the manuscript.