Failures of Gricean reasoning and the role of stereotypes in the production of gender marking in French

We partly replicate von der Malsburg et al. (2020)’s recent experiments investigating the relationship between speaker expectations, gender stereotypes and language use in English on a grammatical gender language: French. The results of our experiment show how the linguistic particularities of the English and French gender marking systems interact with speaker expectations and stereotypes to create different patterns of gender marking production. They also raise a puzzle for current theoretical and computational frameworks that formalize Gricean pragmatics, particularly those in which informativity (Gricean Quantity) is assumed to play a driving role in linguistic production. CÉLINE POZNIAK

The authors of this study were interested in finding out whether the pronouns speakers use when referring to a woman in a traditionally male job could be affected by the stereotypes that speakers associate with this job. In their paper, the authors took the definition of a stereotype from Judd & Park (1993), that is "a belief about the typical characteristics of a group" (von der Malsburg et al. 2020, p.1). Since our squib aim to partly replicate their experiments, we will keep their definition of stereotypes in the remaining of the paper.
According to von der Malsburg et al. (2020), speakers' language use is guided by stereotypes in addition to (or instead of) their expectations about the social gender of the referent, this could result in a usage pattern that is biased in favour of he. From a political perspective, this biased pattern would be problematic since it would contribute to further reinforcing the male stereotypes associated with political leadership, and, in doing so, ultimately contribute to reproducing gender inequalities through language. From the perspective of linguistic theory, this biased pattern would be puzzling, since having language use be determined by stereotypes, rather than guided by Gricean Maxims like Quality (truth telling) and Quantity (informativity) would appear to make the use of gender marking qualitatively different than the use of other kinds of linguistic expressions, like scalar items (Grice 1975;Horn 1984), color terms (Frank & Goodman 2012), gradable adjectives (Lassiter & Goodman 2017), action verbs (Bergen et al. 2016), among many others.
Based on the discussion in their paper, we can therefore oppose two competing hypotheses: the transparent hypothesis, which states that "linguistic preferences transparently reflect event expectations" (p.2), and the stereotype hypothesis, which proposes that gender-based 1 Both for space reasons and because we believe that the theoretical point is particularly pressing for these approaches, we only discuss the RSA implementations of these ideas in this squib. However, similar points can be made about some other analyses in formal semantics, such as Sauerland (2008). There may be other formal semantic or syntactic analyses that avoid these problems, but we cannot catalog them in a detailed way in this squib. Pozniak and Burnett Glossa: a journal of general linguistics DOI: 10.5334/gjgl.1310 stereotypes associated with the specific roles of the president and prime minister play a role in addition to event expectations. The transparent hypothesis boils down to the idea that gendered pronoun use is Gricean (guided by truth, informativity and rationality); whereas, the stereotype hypothesis boils down to the idea that gender stereotypes can override truth and informativity, at least in some cases.
Testing the predictions of these two hypotheses requires making assumptions about the semantics of English pronouns, namely how gender marking in English is related to social gender. Although they briefly explore other options (p.12), von der Malsburg et al. (2020) assume that the semantics of English pronouns is as in (1), where, following Cooper (1983); Heim & Kratzer (1998), we represent the semantic contribution of gender marking as a presupposition on the pronoun: pronouns denote functions who return an individual, provided that individual has the property stated in the gender presupposition. von der Malsburg et al.
(2020) also consider the possibility that he can also refer to women, as in 'generic' uses like Every student brought his book. As described by Bodine (1975), the use of he in contexts where the gender of the referent is unknown or immaterial was prescriptively introduced around the 17th century, where it coexisted for centuries with singular they and inclusive forms, before declining again since the 1970s (see Curzan 2003: for an overview). Although they briefly consider 'generic' he, von der Malsburg et al. (2020) argue that such an analysis is inconsistent with their results, so discard this possibility. ( Note that, for many speakers, inclusive forms and they can naturally also refer to people whose gender fall outside the male-female binary (non-binary, gender-queer etc.). However, von der Malsburg et al. (2020)'s results did not take these into consideration.
The predictions of the transparent hypothesis, given (1), can be made explicit using the architecture of the Rational Speech Act model (Frank & Goodman 2012;Scontras et al. 2018). 2 Within this framework, it is easy to show that the transparent hypothesis predicts that participants should use the most informative (in the sense of Shannon 1948) pronoun to communicate their expectation about the social gender of the referent. More specifically, since the semantic denotation of he includes only men, he is the most informative pronoun for male social gender. Therefore, participants are predicted to favour he when they are (almost) certain that the future president/prime minister will be male. Since the denotation of she includes only women, participants are predicted to favour the feminine pronoun when they are (almost) certain that the referent will be female. Since the semantic denotations of they and inclusive forms include both men and women (as well as non binary), depending on how the transparent hypothesis is formalized, participants may be predicted to favour they or inclusive forms (which von der Malsburg et al. (2020) call gender hedged forms) when they are uncertain. 3 The stereotype hypothesis predicts that participants should take into account something else besides truth and informativity when choosing which pronouns to use. This hypothesis predicts that they should deviate from the informativity-driven transparent pattern in a way that corresponds to the stereotypes that they associate with president or prime minister. When looking at Misersky et al. (2014)'s scores, occupation nouns such as cabinet minister and president rather referred to male professions: the proportions of women for cabinet minister and president were respectively rated 0.28 and 0.12 for English, meaning that participants only perceived 28% and 12% of women in these occupations (and the ratings relatively stayed the same across languages studied in Misersky et al. 2014).
3 The predictions for the use of they and inclusive forms depend a bit on how speaker uncertainty is modelled: if "uncertain" is treated as a separate category from "male expectation" and "female expectation", then the transparent hypothesis indeed predicts that they should be used under uncertainty. But if uncertainty is treated as hesitation between "male expectation" and "female expectation", then, since they is always less informative than he or she for a particular social gender, they/inclusive forms are predicted to be very rare. These points are outlined in the code on OSF. Pozniak and Burnett Glossa: a journal of general linguistics DOI: 10.5334/gjgl.1310 Following previous work on gender stereotypes in language (Duffy & Keir 2004;Foertsch & Gernsbacher 1997;Misersky et al. 2014;Garnham et al. 2015: among others), von der Malsburg et al. (2020) further assume that speakers' gender stereotypes are not random, but can be influenced by experience. Although this experience can be direct, i.e. one can develop gendered mental representations associated with a noun like nurse through observing that the majority of nurses that one interacts with are female, this experience is often discursive, i.e. one can develop gendered mental representations from listening to the way that others talk about members of professions or other social categories. 4 Although the relation between experience and stereotypes is complicated, given that they can be grounded in experience, the stereotype hypothesis also may predict that participants who have direct or discursive experience with both male and female leaders will develop different stereotypes from those whose leaders have always been male.
By this logic, since the US has never had a female president, participants in this country probably have a strong male stereotype for president, and deviation from the transparent pattern is predicted to be in favour of he in this country. The UK, on the other hand, has had a high profile female prime minister (Margaret Thatcher), and the incumbent, Theresa May, is also a woman. So UK participants probably have less strongly male stereotypes for their country's leaders than American participants. Therefore, the stereotype hypothesis predicts that deviation from the transparent pattern should be less favourable to he in the UK than in the US, and possibly include gender neutral forms like they.
The predictions of the stereotype hypothesis were borne out in the results. In the UK study, von der Malsburg et al. (2020) assumed that everyone would have a high degree of expectation (say at least 0.8) that Theresa May would win. As predicted by the transparent hypothesis, the proportion of she is higher than he at ≥ 0.8 degree of expectation. This result suggests that gendered pronoun use is, at least at some level, guided by Gricean principles. However, von der Malsburg et al. found that the form that is favoured at the highest degree of expectation that the Prime minister will be female is they (the least gender informative expression in the English pronominal paradigm); whereas, the transparent hypothesis predicts it should be she, since she is the most informative for female social gender. On the other hand, the deviation from the transparent pattern in the UK data to the benefit of they could be in line with the stereotype hypothesis, on the assumption that the stereotype UK participants associate with prime minister is gender neutral, thanks to prime ministers like Margaret Thatcher and Theresa May. Thus, von der Malsburg et al. (2020)'s results from the UK election present a first puzzle for formal pragmatics: What are the pragmatic mechanisms that allow gender stereotypes to override informativity in English pronoun use?
The results of the US experiment further complicate the puzzle. On the one hand, von der Malsburg et al. (2020) found that American participants' expectations about the social gender of the next president did play a large role in their pronoun use: as participants' expectation that the next president will be female rose, the proportion of their use of he declined. This again suggests that, at some basic level, informativity-based reasoning does underly English pronoun production. However, the authors also found that, as participants' expectations of a female president rose, the proportion of she did not rise. In fact, unlike in the UK study, he remained a productive choice for participants even when they thought that it was likely that the next president would be female. Indeed, at all degrees of expectation, use of he remained higher than she in the American study, something which violates both Gricean Quantity and Quality. The increased use of he in the US compared to the UK is predicted by the stereotype hypothesis: since Americans have a strong male stereotype for president, this stereotype can result in the production of he even at high degrees of female expectation. This raises a puzzle for our pragmatic models which is parallel to the one raised by the UK data: What are the mechanisms that allow gender stereotypes to override informativity (and perhaps even truth)?
The US study showed an additional pattern that complicates the puzzles even more: as expectation in a female president rose, the pronoun whose rate increased was not she, but rather they. This is unexpected under both the transparent hypothesis and a simple stereotype hypothesis. If (1) is correct, they is the least informative expression in the English pronoun system, and yet it appears to be the perceived optimal way in which American participants choose to express expectations about a future female president. This is unlikely to be due to a stereotype effect, since the strong male stereotype is presumably what generates the high rate of he in the data. Instead what seems to be going on here is that the rise of they is an interaction between stereotypes and expectations: when participants have a high expectation that the referent is female with a strongly male stereotype noun, they need to combine these 'conflicting' expectations together. And apparently they is the pronoun that participants find optimal to resolve this conflict. This introduces the additional puzzle: What are the pragmatic mechanisms that allow the interaction between expectations and stereotypes to override informativity?
von der Malsburg et al. (2020)'s study strongly suggests that we need to integrate gender stereotypes into our models of gendered pronoun use. The question is how this should be done.
Are the patterns that we see the product of general reasoning, or are they language specific?
In order to investigate these questions, we turn to our replication of this experiment on French.

Gender marking in the French elections
French is a grammatical gender language, meaning that French grammar sorts nouns into two classes (masculine and feminine) that determine agreement patterns with other words in a sentence (Hockett 1958;Corbett 1991). Therefore, the mapping relations between grammatical gender and social gender are a bit different than in English. The first thing to note is that the question of whether there is a reliable relationship between grammatical gender and social gender is, itself, rather controversial. The view of many influential traditional grammarians, such as the Académie Française (1984,2004) and Grevisse-Goosse (2008), has been that grammatical gender does not reliably indicate social gender, especially for the masculine. On the other hand, feminist qualitative researchers have argued the contrary: that, in the vast majority of cases, noun phrases with masculine marking refer to men, while those with feminine marking refer to women (Violi 1987;Michard 1996;Houdebine 1998: among others).
In the past 15 years, a significant body of research in psycholinguistics has investigated this question for French, and the consensus that emerges from this work is that feminine marking reliably maps to female gender and masculine marking maps to male gender; however, this mapping is probabilistic (see Brauer 2008;Gygax et al. 2008;2012: among others). Building on the psycholinguistic work, we will therefore assume that the mapping between French grammatical and social gender (for human nouns) is as in (2): masculine gender maps to male social gender with a high (but not total) probability; whereas, feminine gender maps more consistently to female social gender. Written French also has a wide variety of inclusive forms (écriture inclusive) for noun phrases (le/la maire, etc.) and pronouns (il ou elle, il/elle, etc.) or inclusive, non binary words such as la personne (the person). There has been some psycholinguistic research on the interpretation of these forms (Chatard et al. 2005;Vervecken et al. 2015), which suggests that they map to both men and women. There is also some research, such as Greco (2013); Coutant et al. (2015); Elmiger (2017), showing that (some of) these inclusive forms are the choice option for some people to communicate gender that does not respect the male-female binary (non-binary, genderqueer etc.). This being said, since, as described in these works, non-binary and other minority social genders are not as widespread in France as in some other countries, we believe that the most realistic hypothesis is that the bulk of our participants have the form-meaning mapping shown in (2).
(2) Let N be a (pro)noun phrase, a.
Given (2), we can now formulate the predictions of the transparent and stereotype hypotheses for French, set within the Rational Speech Act model. The transparent hypothesis predicts that masculine should be used when the participants believe the referent is male, and that the proportion of feminine should rise as expectation in a female referent also rises. (2) does allow for the possibility that an expression with masculine marking can be used to refer to a woman, so it is consistent with masculine being used when participants are (almost) certain that the referent is female. However, because masculine grammatical gender is male-biased, this means that feminine grammatical gender is a more informative signal for female social gender. Therefore, even though the semantics of the French gender marking system in (2) does allow for some masculine to be used at high degrees of female expectation, the transparent hypothesis predicts that the rate of masculine should not exceed the rate of feminine in contexts where participants are certain that the referent is female.
Similar to English, the stereotype hypothesis predicts that gender stereotypes associated with mayors should play a role in the production of gender marked expressions on top of participants' expectations. In Misersky et al. (2014)'s study, mayor in Swiss French referred rather to a male profession (in their questionnaire, the estimated proportion of women was .27, meaning that participants in the questionnaire only perceived 27% of women in this occupation). Again, since these stereotypes are based on direct or discursive experience, participants from different cities with different electoral histories are expected to have different stereotypes. In Marseille, the current mayor, Jean Claude Gaudin, is not seeking reelection because he is retiring after 25 years as the city's mayor. His former deputee, Martine Vassal, was favoured to win leading up to the first round of the election. 5 In Paris, the incumbent is a woman: Anne Hidalgo. She was favoured to win leading up to the election. 6 She is a self-described feminist and has made gender and sexuality a very salient aspect of her first term, holding public consultations with LGBT activist groups, hosting the 2019 Gay Games in Paris, and using inclusive forms on official signs in city hall. Given these different histories, Parisians are well placed to have a less strongly male stereotype for the leader of their city than Marseillais, so the stereotype hypothesis predicts more deviation from the transparent pattern in favour of the masculine in Marseille than in Paris.
Finally, we would like to know whether the French inclusive forms show the behaviour shown by English they at high degrees of female expectation: if using a gender neutral or inclusive expression is a general cognitive strategy to reconcile female expectation with male stereotype, we should expect to find an increase in inclusive forms as expectation in a female reference increases, particularly in Marseille. This being said, French écriture inclusive and English they have very different histories: gender neutral singular they has been widely used in English since the 14th century (Curzan 2003) and, in the syntactic context studied in this paper, has little social meaning. French écriture inclusive, on the other hand, appear to be innovations of the second half of the 20th century, becoming more widespread at the beginning of the 21st (see Abbou et al. 2018). They are still not universally accepted in written French and often communicate the political orientations of those who use them (Abbou 2017). So it remains to be seen whether we will find the same pattern in French as von der Malsburg et al. (2020) found in English.

Experiment
We did an experiment during the first round of the 2020 municipal elections in Paris and Marseille. French elections usually have two rounds: a first one with all the candidates, and then, if no single candidate gets more than 50% of the vote, a second run-off round with the top candidates is held. The first round of municipal elections was held on March 15th, 2020, and the second round on June 28th. Our main experiment concerns the first round.

Design & materials
The experiment was the same in Paris and in Marseille (except for candidate names and city) and was partly adapted from von der Malsburg et al. (2020)'s experiment. It consisted of two parts. First there was a completion task in which participants saw only one item on the screen that was randomly attributed. Each item consisted of a context sentence (either with who or with the person who, (3)) and a sentence to complete ((4), see Appendix A in the OSF repository for the four other sentences used), leading to 10 possible combinations. We manipulated context sentences because we wanted to see whether there was a difference between the pronoun qui and the inclusive word personne (even though personne has a feminine gender).
Then, participants were asked to estimate the probability of winning the municipal elections for the five most popular candidates on a 11-point level slider (the order of the candidates was alphabetical and fixed, see Figure 1 for an example). Contrary to von der Malsburg et al. (2020), we measured expectation and production within participants, meaning that each participant did both tasks: they first had to complete a sentence and then, on a subsequent page, they had to estimate the probability of winning the municipal elections for each candidate. (3) Les élections municipals de mars 2020 vont déterminer qui/la personne The elections municipal of March 2020 will decide who/the person qui dirigera la ville de Paris/Marseille. who govern.fut the city of Paris/Marseille.
'The municipal elections of March 2020 will decide who/the person who will govern the city of Paris/Marseille.' (4) même si son pouvoir n'est pas absolu,… even if their power neg.be neg absolute,… 'even if their power is not absolute,…'

Procedure
Participants read one sentence and completed another one as they wished. Then, on a subsequent page, they estimated the probability of winning the elections for five candidates (Paris: 3 women, 2 men; Marseille: 2 women, 3 men). The experiment lasted around 4 minutes.

Results
We excluded completions that: -weren't a real sentence (le maire, oui…) N=14 for Marseille and N=40 for Paris.
-didn't directly express candidate's gender (il faudra développer des moyens de transports diversifiés, etc) and remain ambiguous, although such completions could also be considered, per se, as inclusive (La nouvelle équipe décide de consulter les citoyens […], or la ville est forte), N=6 for Marseille and N = 25 for Paris. 7 8 participants from Paris and 6 from Marseille did the experiment twice, so we excluded their second participation.  We took into account completions about the mayor in the three possible grammatical gender forms (2), either DPs (le, la maire, even the candidate's name) or pronouns (il, elle). 8 This led us to 49 tokens for Marseille and 92 tokens for Paris (all participants'productions are available in the OSF repository). Our final dataset is unfortunately smaller than we originally expected; however, these results are still informative as to how participants' expectations are related to their language use. Figure 2 shows the proportion of the grammatical forms depending on the expectation that the mayor will be a woman (by taking the median). Probability was done separately in Paris and Marseille. It was calculated by adding female candidates'probability (3 for Paris and 2 for Marseille) divided by the total probability of all candidates. Figure 3 is another visualisation of the results with expectation as continuous. It shows that masculine grammatical is dominant, especially in Marseille, with inclusive forms (le/la maire, la personne, etc) only appearing in Paris.
We did Bayesian binomial regression models (Carpenter et al. 2017;Bürkner 2017;Bürkner & Charpentier 2020) to test the reliability of our effects. For clarity purposes, we will only report and explain the strong effects we found in the statistical analysis. More details about the choice of our analysis as well as the strength of the effects are available in Appendix B in the OSF repository.
As shown particularly in Figure 3 and confirmed in the statistical analysis, masculine forms are more dominant than feminine ones (we excluded inclusive forms from the analysis because of the small numbers). The effect of speaker expectations illustrated in both figures was also confirmed in the statistical analysis as well as the effect of city though to a lesser extent. This can be interpreted as the following: the more participants think that the mayor will be a 8 There were two occurrences, one including the plural pronoun ils, and the other one the noun Homme, that we considered as masculine. Pronouns "il" were considered as masculine forms even though for one context sentence Même si son pouvoir n'est pas absolu one could be ambiguous (Même si son pouvoir n'est pas absolu, il est central).

Figure 3
Production of grammatical gender depending on expectation that the mayor will be a woman (from 0 = masculine, 0.5 = inclusive, to 1 = feminine) for Paris (red) and Marseille (green). Low Probability that the mayor will be a woman High Probability that the mayor will be a woman woman, the more they will use the feminine form. We also find a strong correlation between city and speaker expectation (r =.73), meaning that participants think a woman is more likely to win in Paris, so they use more feminine forms.

Discussion and conclusion
Our study on French replicated a number of results found by von der Malsburg et al. (2020); however, we also found some differences. Our first main result is that, like in the English studies, we find an effect of speaker expectation of the social gender of the next mayor on their use of grammatical gender. In both Paris and Marseille, higher degree of expectation in a female referent translates to more use of the feminine (and less use of the masculine). These results suggest that Gricean reasoning does underlie use of gender marking in French.
However, our most striking result is the dominance of masculine gender, regardless of degree of expectation. Like the US study, masculine is often used when participants think it's likely that the next mayor will be female. This pattern is not predicted by the transparent hypothesis, and actually even suggests that Gricean reasoning has been suspended here: even if French masculine gender can be used to refer to women, feminine grammatical gender is so much more informative to signal female social gender that any Gricean/rational/informativity-based theory of language use presumably predicts that feminine gender should at some point overtake masculine. However, this does not happen. Interestingly, we ran the same experiment for Paris before the second round. This time, the three candidates were only female (Agnès Buzin, Rachida Dati and Anne Hidalgo), meaning the future mayor would necessarily be female. Figure 4 shows the proportion of the grammatical forms produced. Even though feminine forms are now more used, the proportion of masculine forms still remains quite high while the probability that the mayor will be female is 100%. 9 Results from von der Malsburg et al. (2020) showed a bias against the feminine (participants transitioned to gender neutral forms when expectations that there would be a female president were high), whereas we did not find that in our results: there were more feminine, but masculine forms were still used, showing more a bias for masculine forms.
To the extent that this persistent use of the masculine is driven by stereotypes associated with French mayors, it is predicted by the stereotype hypothesis. Coming back to the puzzles raised in von der Malsburg et al. (2020) (i.e. the pragmatic mechanisms allowing stereotypes to override informativity), results from our experiment in French provide an additional argument in favour of including gender stereotypes into our pragmatic models of the use of gendermarked expressions, although, in this squib, we leave how exactly to do this as a puzzle for future work. The fact that we find strong stereotype effects in both English and French opens 9 Context sentence seems to play a role in the second round (with more production of feminine pronouns with person who), but due to limited space, we don't talk about that here while making the results available on the OSF repository. the door to the possibility that the stereotype effect is cognitively general. However, of course this needs to be further investigated with crosslinguistic research.
Evaluating the predictions of the stereotype hypothesis brings us to the consideration of the differences between Paris and Marseille. As discussed in the previous section, we find that Parisians think that it is more likely that a woman will win than Marseillais. 10 Because higher female expectation is related to more feminine grammatical gender, this translates into more feminine in Paris than in Marseille, which is predicted by the transparent hypothesis. The stereotype hypothesis also predicts that, at equal degrees of expectation, there should be more masculine in Marseille than in Paris since Marseillais presumably have a more male stereotype for mayor than Parisians. The differences in expectation between Paris and Marseille render evaluating this prediction difficult. If we look at the degrees of expectation between 0.5-0.75 (i.e. where it is more likely than not that a woman will win, and for which we have data in both Marseille and Paris), there indeed appears to be slightly more masculine in Marseille than in Paris, as shown in Figure 5.
When looking at the data, another difference observed between Paris and Marseille is the inventory of forms used. While inclusive forms are pratically absent from Marseille (one occurrence of the inclusive word la personne), we find a small amount of écriture inclusive and inclusive words (la personne) in our Paris data. 11 In line with the other puzzle (the pragmatic mechanisms allowing the interaction between expectations and stereotypes to override informativity), this pattern supports the hypothesis that French inclusive forms are not playing the same role in the gender marking system as English they: they do not really increase as participants' expectations in a female mayor increase, meaning that inclusive forms are not a strategy used by French speakers to resolve a clash between a high degree of female expectation and a strong male stereotype. Our study therefore suggests that, perhaps unlike the stereotype effect itself, the special way that expectations and stereotypes interact to produce more gender inclusive/neutral forms is language specific. We leave this hypothesis open for future research with additional data.
Why écriture inclusive should be used in Paris and not really in Marseille is unclear. One possibility may be that Parisians have a more gender neutral stereotype for mayor and that the inclusive forms arise because of this stereotype. Another possibility has to do with the salience of the inclusive forms themselves in Paris. As discussed in section (1), the incumbent Anne Hidalgo is very vocal about issues related to gender and sexuality, and, under her direction, city hall uses inclusive forms in many official contexts. Studies of other French cities would be desirable to see whether écriture inclusive is a Parisian exception.
10 Note that the results of the first round put female candidates in the top two spots in both Paris (Anne Hidalgo and Rachida Dati) and Marseille (Michèle Rubirola and Martine Vassal), so the current real probability that Paris and Marseille will have a female mayor is 1.
11 One other possibility to explain this pattern is that some Parisian participants take into account non-binary gender categories, which they use inclusive forms to communicate. However, we have no independent reason to believe that the Marseillais would be so different. In conclusion, while some of the generalizations found by von der Malsburg et al. (2020) also characterize our French results, our cross-linguistic comparison highlights how the linguistic particularities of the English and French gender marking systems interact with speaker expectations and stereotype mental representations to create different patterns of production of gender marked expressions.
Our study also makes the theoretical puzzles raised in von der Malsburg et al. (2020)›s English results more pressing since we show that the ability of stereotypes interfere with Gricean reasoning is not specific to English. More generally, our study and von der Malsburg›s together argue in favour of incorporating belief like gender stereotypes into our pragmatic models and having language specific patterns arise from the combining stereotypes, Gricean reasoning and language specific gender marking inventories. How exactly this should be done is left to future work.

Ethics and consent
The experiments run for this paper followed standard ethics guidelines regarding consent and voluntary participation. All participants were informed about the experimental procedure and future data processing before providing their consent. Participation was voluntary and they could stop participating at any point.