Gender differences in empathizing-systemizing - the influence of gender stereotype and socially desirable responding

The main aim of the present studies was to investigate the infl uence of social and cultural factors on gender diff erences in empathizing-systemizing. Study 1 was designed to control for the socially desirable responding in gender diff erences in empathizing-systemizing. In Study 2 we wanted to investigate whether the activation of gender stereotype would infl uence gender diff erences in the questionnaire and the ability test that measured empathizing-systemizing. Consistently across our two studies and the two measurement methods used (the questionnaire and the ability test), women scored higher in empathizing and the size of the eff ect was medium. Socially desirable responding had no eff ect on the size of gender diff erences in empathizing. However, the activation of the gender stereotype made respondents, especially women, present themselves as more empathetic persons. In addition, the stereotype activation produced a performance boost on the systemizing ability test in men, whereas no eff ect was observed in women.


INTRODUCTION
Although the historical underpinnings of an empathy construct go back to German psychology of the 19 th century, recent decades have witnessed a renewal of interest in empathy (e.g. de Waal 2008;Iacoboni 2009). The substantial number of recent papers have stressed the evolutionary origin of empathy rather than its cultural roots. Research of de Waal (2008de Waal ( , 2009, for instance, reveals empathy as a mechanism common to many species of mammals, not only primates. The current research focuses on testing Baron-Cohen's empathizing-systemizing theory of gender diff erences in cognition (Baron-Cohen 2002).
The construct of empathy has been traditionally regarded as multidimensional and defi ned as (i) cognitive ability to know the mental states of others, (ii) ability to experience emotional reactions that are appropriate to the mental state of others, and (iii) ability to act upon this knowledge and emotions, e.g. consolation of the distressed or helping the suff ering (Bateson 2009). Among various attempts to capture the multidimensionality of empathy, Baron-Cohen's construct of empathizing stands out as a well-grounded proposal (Baron-Cohen 2009;Baron-Cohen and Wheelwright 2004). It has been developed as a part of the empathizing-systemizing theory (Baron-Cohen 2009). The theory explains both the autism spectrum disorder and gender diff erences in empathizing-systemizing. Empathizing encompasses the aforementioned multidimensionality of empathy, but is also related to the theory of mind and taking the intentional stance . Empathizing consists of two components: the ability to attribute mental states to other people and the desire to respond to those mental states with appropriate emotions and actions (Baron-Cohen 2004;Baron-Cohen and Wheelwright 2004). Systemizing is defi ned by Baron-Cohen as a drive to analyse or create systems, i.e. a rule-governed body of knowledge that can refer to mechanical, abstract or numerical domains (Baron-Cohen 2003, 2006. Emotional intelligence, which has generated an extensive amount of research, is a construct related to empathizing. The original idea of emotional intelligence (Salovey and Mayer 1990) was that some persons have the ability to recognize and use emotions to enhance their thinking and action better than others. Yet other researchers have conceptualized emotional intelligence in broad and eclectic ways and included not only abilities but also dispositional traits such as self-esteem, optimism or self-management ( Bar-On 2004;Petrides and Furnham 2001). It seems that the approach to emotional intelligence that is closest to the construct of empathy is the original one of Salovey and Mayer: the ability approach. Research on emotional intelligence particularly important for this study relates to gender diff erences in this domain. A recent study performed on a large sample of adults supported the idea of gender diff erences in ability emotional intelligence, i.e. women tend to outperform men, although the size eff ect is rather small (Cabello et al. 2016).
There is evidence suggesting that not only empathy but also the empathizing-systemizing dimension is related to biological factors (Baron-Cohen 2006. According to the empathizing-systemizing theory of Baron-Cohen (2002, persons with autism spectrum disorder lie at the extreme end of both empathizing and systemizing normal distribution (i.e. very low empathizing, and high systemizing). Since autism spectrum disorder is more prevalent in males compared to females, this pattern of low empathizing and high systemizing has been referred to as an extreme male brain hypothesis of autism (Baron-Cohen 2002). In a recent study (Baron-Cohen et al. 2014) performance of individuals with autism showed no gender diff erence in a test of the cognitive aspect of empathy, whereas females performed signifi cantly better on this test compared to males in the healthy control group. This result corroborated the extreme male brain hypothesis of autism.
Related to the assumption of biological foundations of empathizing-systemizing is the notion that there are gender diff erences in that domain. According to Baron-Cohen (2003) women score higher on empathizing and lower on the systemizing dimension when compared to men. This diff erence has been confi rmed in numerous studies (Baron-Cohen, Knickmeyer and Belmonte 2005;Baron-Cohen and Wheelwright 2004;Goldenfeld, Baron-Cohen and Wheelwright 2005). Baron-Cohen (2003) proposes that the female brain is predominantly hard-wired for empathy or social communication skills, whereas the male brain is hard-wired for focusing on how the systems work and trying to build them (Baron-Cohen 2003). Although the exact biological mechanisms that underlie empathizing-systemizing are still to be discovered, there are some data that point to the biological nature of gender diff erences on these dimensions (Chakrabarti, Bullmore and Baron-Cohen 2006;Knickmeyer, Baron-Cohen, Raggatt and Taylor 2006). An important way of showing biological infl uence on gender diff erences is to demonstrate that these diff erences are constant across diff erent cultures. In an important cross-cultural study, in which four diff erent samples from Malaysia, Slovenia, Switzerland and Turkey were compared (Zeyer et al. 2013), a very stable eff ect of systemizing on motivation to study science was detected across all four cultures. Their structural model showed that gender only indirectly, via systemizing, infl uenced the motivation to study science, but the eff ect size of gender diff erences in empathizing-systemizing was similar to that obtained in European samples (Groen et al. 2015).
Only a few studies conducted so far have shown that gender diff erences in empathizing--systemizing are not entirely immune to cultural infl uences (Berthoz, Wessa, Kedia, Wicker and Grezes 2008;Groen, Fuermaier, Heijer, Tucha and Althaus 2015;Preti et al. 2011;Zeyer et al. 2013;Zheng and Zheng 2015), especially when participants from the West and Far East are compared. These studies suggest that gender diff erences in empathizing are comparable in European samples (Groen et al. 2015;Preti et al. 2011) and similar in a Japanese sample (Wakabayashi et al. 2007), whereas there were no gender diff erences in empathizing in a Korean sample (Kim and Lee 2010). Similarly, recent data from a Chinese study (Zheng and Zheng 2015) revealed no gender diff erence in empathizing, which suggests an existence of culturally determined patterns. In addition, systemizing has shown more consistent and stronger gender diff erences in Asian populations (Wakabayashi et al. 2007;Zheng and Zheng 2015) compared to European ones .
There are several aspects of social and cultural factors that could potentially infl uence empathizing-systemizing (Baron-Cohen 2006; Baron-Cohen et al. 1996;Wakabayashi et al. 2007). One of them that has not been investigated so far in relation to empathizing-systemizing is the stereotype threat. An important form of stereotype is gender stereotype, which refers to a set of generalized characteristics that apply only to men or women. The threat of gender stereotype has been documented to decrease or enhance the performance of women and men on various measures (Aronson and Steele 2005;Steele 1997;Wheeler and Petty 2001), including behavioural and cognitive tasks (Grand, Ryan, Schmitt and Hmurovic 2011;Seibt and Forster 2004) and questionnaires (von Hippel et al. 2005). Since empathizing is closer to a stereotypical view of female skills and systemizing is linked more to male technical talents, one could expect an infl uence of gender stereotype on self-perceived abilities in empathizing-systemizing.
Over the past twenty years research has suggested that the mere existence of negative stereotypes is enough to create an intellectual environment which undermines the performance of those stigmatized. This negative eff ect has been named 'stereotype threat' (Steele 1997) and is defi ned as a situational predicament in which persons are at risk, through their actions, of confi rming negative stereotypes about their groups (Steele and Aronson 1995). The research on stereotype threat suggests that when a person who is aware of the negative stereotype impugning his or her intellectual ability is in a situation that requires him or her to show that ability may fear confi rming the stereotype, and that fear may jeopardize his or her performance (Schmader, Johns and Forbes 2008). More specifi cally the possible mechanisms that compromise the performance of a person under stereotype threat are decreasing working memory, ruminating on one's own ability, and unsuccessful eff orts to regulate one's thoughts and emotions (Schmader, Johns and Forbes 2008). A parallel to stereotype threat is the idea of 'stereotype boost', which has been conceptualized as an infl uence of positive stereotypes on a person's actions that can result in increasing his or her performance (Shih, Pittinsky and Ho 2011). In the case of stereotype boost, possible mechanisms that facilitate the performance boost are: decreasing of anxiety related to the task, increased expectations, increased effi ciency in neural processing, and greater persistence in task completion (Shih, Pittinsky and Ho 2011).
Another social factor that has not been controlled in the context of empathizing-systemizing but introduced mainly at the reliability and validity check phase of the empathizing-systemizing questionnaires, is socially desirable responding. Self-reports of empathizing-systemizing can potentially be contaminated by social desirability and self-favouring biases. As both empathizing and systemizing scales contain items that have clearly positive, i.e. socially desirable, interpretation and, what is more, some of these items are more positive for women and some are more positive for men, gender diff erences in empathizing-systemizing can be contaminated. The four studies in which correlation analysis was performed between the desirable responding scores and diff erent measures of empathy, including empathizing (Baldner and McGinley 2014;Lawrence, Shaw, Baker, Baron-Cohen and David 2004;Preti et al. 2011;Trent, Park, Bercovitz and Chapman 2016) have shown a signifi cant (although weak) relationship between them, which suggests that controlling for socially desirable responding may infl uence the empathizing-systemizing scores in general, and gender diff erences in these dimensions in particular.
In addition, in all the aforementioned studies the socially desirable responding has been exclusively measured with the Social Desirability Scale (Crowne and Marlowe 1960). However, there has been an interesting development in the area of socially desirable responding since Marlowe and Crown's original work (see Paulhus 1984;Paulhus and Reid 1991;Roth, Harris and Snyder 1988;Sackeim and Gur 1978). For example, research has revealed the existence of two separate aspects of socially desirable responding: a conscious tendency to present oneself in a favourable light, and a more implicit tendency to do so (Paulhus 1984(Paulhus , 1998. Factor analytic studies have consistently supported the existence of the two factors, labelled Alpha (Block 1965) and Gamma (Wiggins 1964). Paulhus (1984Paulhus ( , 1986 provided the most extensive evidence of the existence of these two factors, which he labelled 'self-deception' and 'impression management', respectively. The most widely validated current measure of socially desirable responding which captures the two factors is the Balanced Inventory of Desirable Responding (Paulhus 1989). It has become a common and recommended measure of the two-factor model of desirable responding (see meta-analysis by Li and Bagger 2006).
The main aim of the present studies was to investigate the possible moderating infl uence of social and cultural factors on gender diff erences in empathizing-systemizing in healthy participants. The rationale behind this aim was that testing hypotheses of possible biological/ evolutionary origins of psychological mechanism seldom focused on alternative explanations, i.e. cultural factors (see Simpson and Campbell 2016, for a similar argument). We take a Popperian stance in this study, i.e. the best way to show the truth of a hypothesis is to fail to falsify it. If the gender diff erences in empathizing-systemizing are not explained by social and cultural factors, the hypothetical biological or evolutionary roots of the diff erences can be treated as probable. Study 1 was designed to control for the socially desirable responding in gender diff erences in empathizing-systemizing, in order to exclude this factor as a possible explanation of gender diff erences. To our best knowledge, socially desirable responding has not been controlled in any published study that examined gender diff erences in empathizing-systemizing. Also, the Balanced Inventory of Desirable Responding (Paulhus 1989) has not been used so far to analyse the relationship between the empathizing-systemizing questionnaires and socially desirable responding. In Study 2 we wanted to investigate whether the activation of gender stereotype would infl uence gender diff erences in performance in the domain of empathizing and systemizing measured by self-report and ability instruments. Although a strong eff ect of gender stereotype has been documented in many domains (see Aronson and Steele 2005 for a review), there has been no attempt so far to test its infl uence in the empathizing-systemizing domain. If confi rmed, the gender stereotype eff ect on performance in empathizing-systemizing can have important implications on the diagnostic validity of the instruments that measure empathizing-systemizing for detecting persons on the autism spectrum. Also, the gender stereotype eff ect can suggest greater caution in corroborating sex diff erences that have allegedly biological or evolutionary origins.

STUDY 1
To control for socially desirable responding in gender diff erences in empathizing and systemizing, we administered questionnaires developed by Baron-Cohen and collaborators (Baron-Cohen and Wheelwright 2004;Wheelwright et al. 2006) and Paulhus' Balanced Inventory of Desirable Responding (Paulhus 1989). We expected that desirable responding would infl uence gender diff erences, and therefore controlling for desirable responding would reduce the eff ect size of gender diff erences. We expected this reduction to be rather small. Our prediction of a small decrease in gender diff erences was based on the presumed biological foundations of gender diff erences in empathizing and systemizing (Baron-Cohen 2009). It was also based on signifi cant but low correlations between empathizing-systemizing and socially desirable responding revealed in the studies in which Baron-Cohen and collaborators' questionnaires of empathizing/systemizing and the Marlowe-Crowne Social Desirability Scale were used (Baldner and McGinley 2014;Lawrence, Shaw, Baker, Baron-Cohen and David 2004;Preti et al. 2011;Trent, Park, Bercovitz and Chapman 2016). METHOD PARTICIPANTS A total of 93 volunteers (60 women) were recruited from various departments of the Humanities of the Jagiellonian University. They were given extra credit for their participation. The mean age of the participants was 21.3 years, with standard deviation 1.89, and it ranges between 19 and 29.

MATERIALS AND PROCEDURE
Empathy Quotient (EQ, Baron-Cohen and Wheelwright 2004) was used to assess empathizing. The original Empathy Quotient was a self-report questionnaire with 60 items, including 20 fi ller items. However, the version that is currently recommended on the Cambridge website uses only 40 interpretable items. Empathy Quotient has high reliability and validity indices (Baron-Cohen and Wheelwright 2004; Lawrence, Shaw, Baker, Baron-Cohen and David 2004), with α Cronbach equalling .92 and test-retest correlation equalling .84. For the purpose of the present research a Polish version of the 40-item EQ (without fi ller items) was prepared. The translation was checked by a bilingual specialist in psychology. A 4-point Likert-type scale was used in the questionnaire. Participants received 1 or 2 points for an "empathic" response (slightly agree and strongly agree, respectively) and 0 for the two other responses. The Cronbach α for the Polish version of Empathy Quotient was .81.
Systemizing Quotient (SQ, Baron-Cohen, Richler, Bisarya, Gurunathan and Wheelwright 2003) was used to assess systemizing. The original Systemizing Quotient was a self-report questionnaire with 60 items (including 20 fi ller items). Items in the original Systemizing Quotient were drawn primarily from traditionally male domains. To improve Systemizing Quotient in this respect, a new revised version (SQ-R) was developed (Wheelwright et al. 2006) that includes new items that are relevant to females in the general population. Systemizing Quotient Revised has high reliability and validity indices (Wheelwright et al. 2006), with α Cronbach equalling .90. For the purpose of the present research a Polish version of the 75item Systemizing Quotient Revised was prepared. The translation was checked by the same bilingual specialist in psychology. A 4-point Likert-type scale was used in the questionnaire. Participants received 1 or 2 points for the "systemizing" response (slightly agree and strongly agree, respectively) and 0 for the two other responses. The Cronbach α for the Polish version of Systemizing Quotient Revised was .83.
The last instrument used in the study was Paulhus' Balanced Inventory of Desirable Responding (Paulhus 1984). It is a self-report measure of a tendency to provide positive self-descriptions with satisfactory reliability and validity indices (Paulhus 1984(Paulhus , 1989. The questionnaire captures two distinct aspects of socially desirable responding: a more implicit tendency, i.e. self-deceptive enhancement (SDE, 20 items), and a more conscious one, i.e. impression management (IM, 20 items). The reliability of the Polish version of the questionnaire was established in a previous study by Niedźwieńska and Neckar (2013) and was satisfactory: Cronbach's α were .79, .70, and .69 for the entire scale, the SDE subscale, and the IM subscale respectively. The three questionnaires were administered during the same session and their administration was counterbalanced.

DESIGN
The design was a simple factorial design, with gender as a between-subjects factor and desirable responding as a covariate.

RESULTS AND DISCUSSION
For all statistical tests reported below, the rejection level was set at .05 (unless otherwise specifi ed).
To assess gender diff erences in empathy, a one-way ANOVA with gender as a factor and Empathy Quotient as a dependent variable was performed. It revealed signifi cant gender diff erences, F(1, 91) = 15.81, η 2 p = .15, with women scoring higher (M = 44.35, SD = 9.56) compared to men (M = 36.94, SD = 6.46). Next, two one-way ANCOVAs were conducted to determine a diff erence between women and men on Empathy Quotient while controlling for Self-Deceptive Enhancement and Impression Management. There was a signifi cant effect of gender on the Empathy Quotient after controlling for Self-Deceptive Enhancement, F(1, 90) = 16.35, η 2 p = .15, and Impression Management, F(1, 90) = 13.98, η 2 p = .13. Women scored higher compared to men, with controlling for Self-Deceptive Enhancement (M = 44.41, SD = 9.56; M = 36.84, SD = 6.46, for women and men respectively) and Impression Management (M = 44.26, SD = 9.56; M = 37.10, SD = 6.46, for women and men respectively).
In the following step, gender diff erences in systemizing were assessed. First, a one-way ANOVA with gender as a factor and the Systemizing Quotient Revised as a dependent variable was calculated, and it revealed no gender diff erences, F(1, 91) = .86, p = .36, η 2 p = .01. Next, the eff ect of desirable responding on the Systemizing Quotient Revised was established by conducting two one-way ANCOVAs with Self-Deceptive Enhancement and Impression Management scores as covariates. In neither analysis was the gender eff ect signifi cant: F(1, 90) = .42, p = .58, η 2 p = .005 for Self-Deceptive Enhancement, and F(1, 90) = 2.50, p = .12, η 2 p = .03 for Impression Management. Summing up, we found the expected gender diff erences in empathizing (medium eff ect size), with women outscoring men. However, the infl uence of Impression Management on gender diff erences in empathizing was negligible, i.e. the eff ect size for gender diff erences was still medium when controlling for Impression Management. Similarly, the infl uence of Self-Deceptive Enhancement on gender diff erences in empathizing was none, i.e. there was the same eff ect size in gender diff erences with and without controlling for Self-Deceptive Enhancement. We could not analyse any infl uence of socially desirable responding on gender diff erences in systemizing, as we found no gender diff erences on this dimension at all.

STUDY 2
Study 2 investigated the eff ect of the activation of gender stereotype on gender diff erences in the questionnaire and the ability test that measured empathizing-systemizing. In one condition the questionnaire and the ability test were administered without any special instructions in both the female and male groups. In the other condition the stereotype of women being better at emotion recognition and men being better at technical problems was activated just before administering the questionnaire and the ability test. We hypothesized that gender diff erences would be larger after the activation of gender stereotype. Specifi cally, we expected that the stereotype activation would increase the results of women in empathizing and the results of men in systemizing. For the questionnaires the possible mechanism would be the increased tendency to depict oneself in accordance with gender stereotype. For the ability tests the change in performance would refl ect the increased motivation to do well on a task that fi ts the stereotype well.

METHOD DESIGN
The design was a 2 x 2 factorial design, with group (experimental vs. controls) and gender as between-subject factors.

PARTICIPANTS
A total of 104 participants (52 females) were recruited and randomly assigned to the experimental and control conditions. There were 24 women and 26 men in the experimental condition. They were all students of various humanities and engineering faculties of several universities in Southern Poland. Members of the humanities and engineering faculties were equally distributed across gender as well as across control and experimental groups. The mean age of the participants was 23.08, with standard deviation of 2.22.

INSTRUMENTS
Two of the instruments used in Study 2 were the same as in Study 1: Empathy Quotient and Systemizing Quotient Revised. We calculated reliability for each of them in Study 2. Reliability of Empathy Quotient was similar to that of Study 1: Cronbach α = .82. Reliability of Systemizing Quotient Revised was higher, Cronbach α = .90, compared to Study 1.
To measure empathy more objectively, the Reading the Mind in the Eyes test (henceforth, Eyes test, ) was also administered. This is a 36-item measure of human capacity to decipher a mental state from a picture of the eyes region. It has been designed as an advanced measure of mind-reading. The Eyes test uses four mental-state descriptions consisting of the correct target word and three incorrect foil words. The translations of all descriptions were checked by a bilingual (English and Polish) specialist in psychology. The reliability of scale was not originally reported by Baron-Cohen. In the present study Cronbach α equalled .58, which is very consistent with results obtained in other studies (Harkness, Jacobson, Duong and Sabbagh 2010;Voracek and Dressler 2006). The analysis of items showed that the accuracy of responses was more than 50% for most items, and only 5 items had a response rate between 38% to 44%. This confi rms that target words were chosen at higher rates than expected by chance.
The Folk Physics Test (Baron-Cohen, Wheelwright, Spong, Scahill, and Lawson 2001) was administered to measure skills of systemizing. The test comprises 20 items that depict various mechanisms refl ecting the laws of physics. The items take the form of questions about what will happen next, with the answers in the multiple-choice format. In their original paper the authors of the test provided only information on accurate responses on each item. In the present study we calculated reliability of this test and Cronbach alpha = .52, which is a rather low value. However, the percentage of correct responses was similar to that obtained by Baron-Cohen and his collaborators (2001), with the correct performance of over 50% on the majority of items. Performance was lower (23% to 45%) on seven items only.

PROCEDURE
The questionnaires and tests were administered in small groups, which were gender specifi c. The fi rst participants were given Empathy Quotient and Systemizing Quotient Revised, which was followed by the administration of the Eyes test and Intuitive Physics test. In the control condition participants were given the following instructions: "Our research concerns how diff erent aspects of personality are related to an ability to solve problems. You will be given two personality questionnaires and two ability tests that measure cognitive skills in solving problems in various domains." In the experimental condition the instructions read: "Our research concerns how diff erent aspects of personality are related to an ability to solve problems. You will be given two personality questionnaires and two ability tests that measure cognitive skills in solving problems in various domains. If you encounter any problems in solving the problems please do not take it personally, because women are known to be much better in recognizing emotions of others, whereas men are much better in solving technical problems."

RESULTS AND DISCUSSION
The Empathy Quotient responses were entered into a 2 (group: stereotype activated vs no activation) x 2 (gender) factorial ANOVA. The main eff ect of gender was significant F(1, 100) = 9.37, η 2 p = .09, with female participants scoring higher than males. Also, participants in the experimental condition scored higher in empathy than the controls, F(1, 100) = 6.29, η 2 p = .06. The interaction was not signifi cant, F(1, 100) = 1.12, p = .29, η 2 p = .01. To test whether, as predicted, women in the experimental condition scored higher compared to women in the control condition we performed a simple eff ects analysis which revealed a signifi cant diff erence F(1, 100) = 6.33, η 2 p = .06. There was no signifi cant difference between experimental and control conditions for men. We also calculated a simple eff ects analysis for gender diff erences in the control and experimental groups to test the hypothesis that gender diff erences would be larger in the latter groups. Women scored higher compared to men in the experimental group F(1, 100) = 8.16, η 2 p = .08. For the control condition the diff erence did not approach statistical signifi cance, F(1, 100) = 2.09, p = .15, η 2 p = .02 (see Figure 1). Similarly, the Systemizing Quotient responses were entered into a 2 (group) x 2 (gender) factorial ANOVA. No signifi cant eff ects were found, all ps > .10. There was however a gender diff erence in the experimental group that approached statistical signifi cance, F(1, 100) = 3.70, p = .057, η 2 p = .04, with women scoring lower than men.
Next, we examined the performance in the Eyes test. The main eff ect of gender was signifi cant, F(1, 100) = 13.96, η 2 p = .12. As expected, female participants scored higher (M = 24.31; SD = 4.13) compared to males (M = 21.35; SD = 3.89) in the test. The main eff ect of group and the interaction were not signifi cant (F s < 1).
To examine diff erences in performance on the Intuitive Physics test, the scores were entered into a 2 (group) x 2 (gender) factorial ANOVA. There was a main eff ect of gender, F(1, 100) = 21.44, η 2 p = .18, refl ecting better performance of male participants. The main eff ect of group was not signifi cant (F < 1). However, this was qualifi ed by a two-way interaction between gender and group, F(1, 100) = 7.34, η 2 p = .07. As expected, male participants scored higher than females in the experimental condition, F(1, 100) = 25.94, η 2 p = .21, but not in the control condition (p = .17). Men in the experimental condition scored higher than men in the control condition, F(1, 100) = 5.24, η 2 p = .05. The condition did not have a signifi cant eff ect for women, F(1, 100) = 2.38, p = .13, η 2 p = .02 (see Figure 2). The expected pattern of gender diff erences was partially supported in Study 2. As we expected, women in the stereotype activation condition scored higher compared to men, and also higher compared to women in the control condition when empathizing was measured by the questionnaire. The eff ect of stereotype activation was not observed in the Eyes test. However, the test revealed signifi cant gender diff erences overall (medium size eff ect).
Interestingly, a diff erent pattern was revealed when the questionnaire and the ability test were used to measure systemizing. When measured by the questionnaire, systemizing showed only a weak gender diff erence in the experimental condition. In contrast, our hypotheses were fully confi rmed when systemizing was measured by the ability test, i.e. we found stronger gender diff erences in the experimental condition as well as better performance of men in the experimental group compared to men in the control group.

GENERAL CONCLUSIONS
The present studies examined socio-cultural eff ects on gender diff erences in empathizing-systemizing. Generally, our studies add to previous reports that showed superiority of females in empathizing as measured by questionnaires and ability tests. Critically, our results extend previous fi ndings by showing the gender stereotype eff ects on gender diff erences in empathizing-systemizing that were particularly strong in the domain of systemizing.
First, our fi ndings generally support the eff ect size of gender diff erences in empathizing that has been reported in previous studies (see Groen et al. 2015). Consistently across our two studies and the two measurement methods used (the questionnaire and the ability test), women scored higher and the size of the eff ect was medium. The pattern of results related to systemizing was inconclusive. Although the ability test revealed large gender diff erences in favour of men, the systemizing questionnaire showed only a weak gender diff erence in the gender stereotype activation group. A probable cause of weak gender eff ect in the questionnaire scores might have been the use of the revised version of the Systemizing Quotient, which favours men to a lesser extent compared to the original version of the systemizing questionnaire (Wheelwright et al. 2006).
Second, socially desirable responding had no eff ect on the size of gender diff erences in empathizing. As we did not reveal gender diff erences in systemizing in Study 1, the possible infl uence of socially desirable responding on systemizing needs further data collection. This pattern of results, i.e. an eff ect size for empathizing similar to other studies on European (Groen et al. 2015;Preti et al. 2012) and Asian (Kim and Lee 2010) samples and the unchanged eff ect size while controlling for socially desirable responding, seems to be in accordance with the hypothesis put forward by Baron-Cohen (2009) according to which gender diff erences in empathizing are biologically determined.
However, the results of Study 2 suggest that gender diff erences in empathizing-systemizing are not immune to socio-cultural infl uences. The eff ect of gender stereotype activation was strong in the empathizing questionnaire and in the Intuitive Physics test. The former probably shows a self-presentational eff ect of gender stereotype activation on empathizing, which should be taken into account in future research.
There was no eff ect of stereotype activation on performance in the Eyes test, whereas this infl uence was strong and in the expected direction in the Intuitive Physics test. This somewhat perplexing pattern of results is nevertheless coherent with the results of many studies that investigated the infl uence of gender stereotypes on the performance in the female and male groups (Grand, Ryan, Schmitt and Hmurovic 2011). Women seem to be more susceptible to stereotype threat, whereas men seem to be more immune to stereotype threat. Therefore, women tend to perform below their actual abilities on the tasks in which men are presumably better, whereas performance of men on the tasks in which women are presumably better is not aff ected by the awareness of a gender diff erence in performance (Franceschini, Galli, Chiesi and Primi 2014;Seibt and Förster 2004). The results of Study 2 suggest that men were especially motivated to perform well in the Intuitive Physics task when their gender stereotype was activated. A possible mechanism that may have contributed to their better performance compared to the control group could be linked to interpreting the test situation as a challenge rather than a threat. If a person is more motivated while performing the Intuitive Physics test there is a greater chance of success, as the items of the test require focusing and thinking hard to fi nd a solution to them. The better performance of men on the Intuitive Physics test under the gender stereotype activation is congruent with existing research on the stereotype boost, which is the result of exposure to positive stereotypes (Shih et al. 2002). There are specifi c mechanisms that may underlie this performance boost, i.e. reducing anxiety, increasing efficiency in neural processing, and activating ideomotor processes (Shih, Pittinsky and Ho 2012).
No eff ect of stereotype activation on the performance of women in the Eyes test could be explained by a greater diffi culty of the Eyes test. Overall, women scored higher than men in this test and it is likely that even if they experienced stereotype boost, they were not able to signifi cantly improve their performance. The Eyes test requires more intuitive knowledge and thus it may be that women could not profi t so much from the stereotype boost. The participant just needs to possess the ability to recognize what emotion a person in the picture is expressing. An alternative explanation of lack of stereotype activation eff ect on women would refer to observed better eff ects of stereotype boost when stereotype activation was in a more implicit than explicit form (Shih, Pittinsky and Ho 2012).
Some limitations of the present studies should be acknowledged. The samples were rather small, which may explain why some of the eff ects were not signifi cant. However, the eff ect size of gender diff erences in empathizing was similar to those obtained in other studies with bigger samples (Groen et al. 2015;Preti et al. 2011). As the participants were university students, it limits the interpretation of results to young adults. Also, some of the measures used were ability tests but their reliability was rather low, which might have attenuated some of the eff ects.
Our fi ndings suggest that gender stereotype activation can be a serious threat to the validity of results obtained by measuring the empathizing-systemizing dimension in the female and male groups. As knowledge of gender diff erences in empathizing becomes more and more common knowledge, special care should be taken not to activate gender stereotypes in the context of research in which an empathizing questionnaire is used. In addition, similar precautions are recommended in the case of the Intuitive Physics test of systemizing. These caveats apply specifi cally in those clinical settings where the instruments of this study would be used as a part of autism spectrum diagnosis.
In conclusion, we found robust gender diff erences in empathizing that were not infl uenced by socially desirable responding. However, these diff erences were not entirely immune to social infl uences: the activation of the gender stereotype made respondents, especially women, present themselves as more empathetic persons. In addition, we found gender diff erences in systemizing, as measured by the ability test, that were also infl uenced by the gender stereotype activation. The stereotype activation improved the performance of men, which may be attributed to the stereotype boost, i.e. increasing their motivation to do well in those tests that fi tted the male stereotype well.