Bilingualism, or knowledge of two or more languages, encompasses social, cognitive, and linguistic experiences. Growing evidence supports the notion that bilingual experiences are diverse, and no two bilinguals share the same social experiences that underlie their development and maintenance of bilingualism (Anderson et al., 2018; Gullifer & Titone, 2019, 2021; Leon Guerrero & Luk, 2021; López et al., 2021; Navarro-Torres et al., 2021; Titone & Tiv, 2022; Tiv et al., 2021a, 2022a; Wigdorowitz et al., 2020). Of importance, bilingual experiences have observable consequences for cognition, including language comprehension. Here, we focus on one form of pragmatic language comprehension, verbal irony, and how it relates to individual differences in bilingual experience.

Verbal irony (henceforth referred to as “irony”) is characterized by a juxtaposition between what is said and what is meant (Katz et al., 2013), resulting in pragmatic or intentional ambiguity. Irony is ubiquitous in everyday conversations (Gibbs, 2000), canonical literature (Müller, 2017), and political dialogue (Nuolijärvi & Tiittula, 2011). Decades of cross-disciplinary research have revealed that irony comprehension is shaped by a complex interplay of factors (Dews & Winner, 1995; Dress et al., 2008; Katz et al., 2013; Olkoniemi & Kaakinen, 2021; Olkoniemi et al., 2016; Pexman, 2008; Pexman & Olineck, 2002; Shamay-Tsoory et al., 2005), including, recently, bilingual experiences (Tiv et al., 2019, 2020, 2021b). In this paper, we examine the relationship between individual differences related to bilingual experience, specifically mentalizing capacity and neighborhood language diversity, and irony processing among a sample of bilingual adults living in a linguistically diverse region, Montréal, Canada.

Theories of irony comprehension and empirical evidence from bilinguals

Theoretical accounts of irony comprehension have mostly focused on monolingual or presumed monolingual speakers (for a comprehensive review, see Garmendia, 2018). While early theories of irony comprehension examined whether the pragmatic meaning of ironic language is processed serially or in parallel to the literal meaning (e.g., standard pragmatic view, direct access view, graded salience hypothesis; Gibbs, 1994; Giora, 1997; Grice, 1975), later approaches considered how the context in which irony is used may constrain processing. For example, irony may be used in response to a positive situation, like stating “this pie is atrocious” while gobbling down a third slice of prize-winning pie (“ironic compliments”). Conversely, irony may follow a negative situation, such as stating “this pie is delicious” while dubiously picking at a moldy slice of pie (“ironic criticisms”). The latter form, ironic criticisms, are more commonly used in North American discourse.

This asymmetry of affect is based in the notion that if ironic compliments are misunderstood (i.e., taken literally), they can be more insulting and socially damaging than misunderstanding ironic criticisms. According to the 'mention theory', peoples’ expectations of the world skew towards positive social norms and positive outcomes (Jorgensen et al., 1984; Sperber & Wilson, 1981). Since irony reflects implicit echoic mentions of past utterances, a misunderstanding of ironic criticisms (i.e., literally positive on the surface) would align with this default positive worldview. The pretense theory, which frames irony as evoking pretense or play, similarly suggests that ironic speakers and listeners will take on more discursive roles that reflect this positivity bias in irony (Clark & Gerrig, 1984).

Other factors may influence how irony is processed. The parallel constraint-satisfaction framework of irony comprehension highlights the interplay of discourse, speaker, and contextual cues, which are synthesized in parallel and probabilistically converge onto an ironic or literal interpretation of a pragmatically ambiguous statement (Katz et al., 2013; Pexman, 2008). Given the prevalence and social acceptability of ironic criticisms versus ironic compliments, this model predicts the ironic criticism interpretation would, over time, receive more activation and be a more plausible interpretation in subsequent interactions than ironic compliments. The interpretation of these cues is further modulated by speaker and listener-level attributes, including inferences about others’ mental states, which we probe in this paper (see also Colston & Gibbs, 2002; Kaakinen et al., 2014).

The social functions of irony may generally involve muting the meaning conveyed by literal language (the tinge hypothesis; Dews & Winner, 1995). This can result in speakers of ironic criticisms being perceived as less annoyed and more polite than speakers of literal criticisms, and speakers of ironic compliments being perceived as less pleased and less polite than speakers of literal compliments (Joergensen et al., 2021; Pexman & Olineck, 2002). Other work has shown that sarcastic irony is primarily used for four communicative functions among presumed monolinguals: general purposes, frustration diffusion, embarrassment diffusion, and face-saving (Ivanko et al., 2004).

Our findings indicate bilingual adults generally use sarcastic irony for similar functions as monolinguals (Tiv et al., 2019). In previous work, we examined how bilingual speakers comprehend ironic language (Tiv et al., 2020). Tiv et al. (2020) tested bilinguals on an irony reading and comprehension task in their first language, English. Bilingual adults rated ironic criticisms as more sensible than ironic compliments, and their responses were faster for ironic criticisms. While this study did not include a monolingual comparison group, the main result is consistent with other findings on presumed monolingual samples examining comprehension of written ironic language (e.g., Katz, 2005). The results are also consistent with recent work directly comparing first and second language English users on an irony identification task, which found that both first and second language users had trouble identifying ironic compliments (Ellis et al., 2021). This suggests that bilingual and monolingual adults, measured at the group level, may process written ironic language in similar ways.

However, we observed behavioral differences among bilingual adults, as a result of individual differences in bilingual language experience. In one study, we discovered a positive relationship between second language proficiency and self-reported general sarcastic irony use, irrespective of language (Tiv et al., 2019). In another study, proficiency in the second language predicted on-line first language irony comprehension, such that high proficiency bilinguals found ironic statements more sensible than low proficiency bilinguals and they were faster to respond to ironic compliments (Tiv et al., 2020). Related work on second language reading of metaphor, another pragmatic element, also evidenced how individual differences in second language proficiency modulated reading patterns (Olkoniemi et al., 2021).

There are many reasons why certain bilingual language experiences relate to irony comprehension, including enhanced metalinguistic awareness, executive control, mental state reasoning (reviewed in Schroeder, 2018), as well as tolerance for ambiguity (e.g., Dewaele & Wei, 2013). While this remains an open question, mounting evidence highlights the social experiential outcomes associated with bilingualism (Ikizer & Ramírez-Esparza, 2018; López et al., 2021; Ramírez-Esparza et al., 2020). Engaging with others through multiple languages may offer insight on their unique and diverse mental states, which may in turn boost ironic language comprehension (also discussed in Antoniou et al., 2019).

Bilingualism and mentalizing

Social cognitive capacities which offer insight on others’ mental states appear in the literature under many terms, including “theory of mind,” “perspective-taking,” “mentalizing,” and more. There does not seem to be a clear consensus if these terms refer to distinct cognitive processes (e.g., Harris, 2017) or overlapping ones (e.g., Frith & Frith, 1999, 2021; Saxe & Kanwisher, 2003). Throughout this paper, we draw evidence from papers that use any of these terms, and we ourselves use the term “mentalizing” to broadly encompass thinking about others’ mental states.

As a group, bilinguals outperform monolinguals on a variety of mentalizing-based tasks across the lifespan (Goetz, 2003; Navarro & Conway, 2021; Rubio-Fernández & Glucksberg, 2012; Schroeder, 2018; Sundaray et al., 2018). A meta-analysis of 16 studies examining mentalizing between bilingual and monolingual children across many cultural and linguistic settings found higher performance among bilinguals. These results persisted despite controlling for language proficiency differences and testing for publication bias by means of Eggers regression intercept test (Schroeder, 2018). Most of the studies in this meta-analysis used a version of the false belief task to test mentalizing. Navarro and Conway (2021) found that adult bilinguals also outperformed monolinguals on another mentalizing and perspective-based task (director task). While these studies provided the groundwork for understanding the relationship between bilingualism and mentalizing, their implication for the role of individual differences in bilingual experience is limited.

In two recent papers, our group examined individual differences in bilingual language experience on a novel mentalizing task (Tiv et al., 2021b, 2022b). In this task, participants naturally read sentence pairs that relied on mentalizing to cohesively resolve intentional ambiguity (e.g., A person did X because they were thinking Y). This approach mirrors the task designs commonly implemented in studies of irony comprehension (i.e., sentence reading). Further, it reflects the everyday situations, like reading a story or message from a friend, where people may engage mentalizing, as opposed to laboratory-contrived tasks. Results from both papers revealed that performance on this mentalizing task was associated with different social aspects of how bilinguals used their languages. Tiv et al. (2021b) revealed that greater diversity in how one personally uses their languages, as measured through language entropy (Gullifer & Titone, 2019), patterned with greater mentalizing capacities. Tiv et al. (2022b) extended these findings by examining interpersonal diversity of language use within one’s social network. The results indicated that more diverse social network experiences also related to greater mentalizing capacities on this task.

The present paper builds on this work by examining whether performance on this mentalizing task predicts irony comprehension, as irony provides a clear test case of resolving ambiguities in pragmatic meaning. We do so by also examining the interplay of these social, cognitive, and linguistic processes in the context of different language ecologies, indexed by ambient exposure to language diversity across different neighborhoods. In this final section, we review how language experiences extend beyond the internal properties of the individual and are formed by dynamics in the social environment.

Influence of social ecology

Burgeoning theoretical models of human cognition and language indicate the mind and its internal processes are influenced by external properties of the social context (discussed through the lens of bilingualism in Titone & Tiv, 2022; Tiv et al., 2022a). How a bilingual person processes language is related to how they use language with other people, how language appears in their social environment, how language is collectively valued, and how languages change over time. These high-order social contextual dynamics constrain how individual differences in bilingual experience relate to mentalizing. For instance, Tiv et al. (2022b) found that greater diversity in language-based social network structure was associated with greater mentalizing capacity only in a region of high linguistic diversity. This finding suggests that mentalizing may emerge as an adaptive cognitive process in response to the demands of specific social contexts. Therefore, in the present paper, we examine the role of variations in people’s linguistic social ecologies, specifically language diversity across residential neighborhoods, in the relationship between mentalizing and irony comprehension.

Ambient exposure to language diversity has been linked to greater mentalizing behavior among infants (Liberman et al., 2017) and children (Fan et al., 2015). In both studies, living in an area with high exposure to many languages related to greater performance on the director task (and an adapted version for infants), whereas no relationship was detected with executive control abilities. Other work has shown exposure to diverse languages also facilitates acquisition of novel languages through increased sensitivity to the variance of sounds and linguistic inputs (Bice & Kroll, 2019). Taken together, it is possible that ambient, ecological exposure to multiple languages also engages more robust mentalizing and irony comprehension among adults.

Research from variational pragmatics has shown that irony and other forms of pragmatic language (e.g., politeness, dialect) vary as a function of social contextual and regional factors (Schneider, 2010). For example, Dress et al. (2008) found that adults living in the northern United States use more ironic comments than adults living in the southern United States, even after controlling for demographic traits and geographic properties. Others have shown similar regional results for perceptions of pragmatic intent (Cohen et al., 1996) and preferences for apologies (Barron, 2009). In one experimental study, changes in socio-ecological indicators within narrower geographic regions (neighborhoods of a city) caused dialectal shifts in Black youths’ use of African American English (Rickford et al., 2015). Beyond the domain of language, neighborhood-level social-contextual features, such as racial diversity, correlate with other social cognitive behaviors, such as implicit biases (Hehman et al., 2021; Ofosu et al., 2019; Sadler & Devos, 2020). These sources of ambient social information may serve as environmental cues for setting expectations of adaptive behavior.

Present Study

This paper examines the relationship between mentalizing capacity, neighborhood exposure to language diversity, and irony comprehension among bilingual adults in Montréal, Canada. Past findings implicate mentalizing as a core process underlying the link between individual differences in bilingual experience and irony comprehension (Antoniou et al., 2019; Tiv et al., 2019, 2020, 2021b). Mentalizing itself may be constrained by ambient exposure to language diversity in the social environment (Fan et al., 2015; Liberman et al., 2017). Thus, while greater mentalizing capacities and exposure to language diversity may each boost irony comprehension on their own, for the reasons discussed above, we expect an interaction between these two factors, similar to past work (Tiv et al., 2022b).

We use a novel irony comprehension task that probes ironic compliments and ironic criticisms on appropriateness and perceived irony. These metrics indirectly evaluate whether people understand the rich and complex social communicative functions of irony, like sparking humor, signaling group membership, softening critical attitudes, or expressing power (Burgers et al., 2015; Dews & Winner, 1995; Drucker et al., 2014; Pexman & Olineck, 2002). We predict that greater mentalizing capacities will pattern with more appropriate and accurate perceptions of irony, particularly in high language diversity neighborhoods, as these contexts may nurture stronger social cognitive processing. We also expect potential differences between ironic compliments and ironic criticisms, given the social salience of ironic criticisms in North American discourse.

The experimental items created for this irony comprehension task are novel; thus, we also include a manipulation check of appropriateness and perceived irony ratings and reaction times across the sample. Consistent with past theoretical frameworks (e.g., mention theory, pretense theory, tinge hypothesis, parallel constraint-satisfaction framework; Clark & Gerrig, 1984; Dews & Winner, 1995; Pexman, 2008; Sperber & Wilson, 1981) and empirical evidence from English–French bilinguals in Montréal, Canada (Tiv et al., 2020), we hypothesize ironic criticisms ratings will be faster and greater in appropriateness and perceived irony than ironic compliment ratings.

Methods

Sociolinguistic context

The city of Montréal is globally recognized for its unique multilingualism, which many characterize through the popularized greeting, “bonjour, hi” in most retail, food, and service settings (Heller, 1978; Leimgruber, 2020). This unique linguistic milieu resulted from culturally and linguistically diverse First Nations communities, which the French colonized in the late 16th to early 17th centuries, and which the British seized after the Seven Years’ War in 1763. Political tensions, religious affiliation, and economic inequality gave rise to social stratification between English and French speakers, which culminated with the Quiet Revolution in the 1960s. During this period, Québécois nationalism grew among French speakers, who comprised a socially and economically disadvantaged majority (Sioufi & Bourhis, 2017). This led to language legislation still in effect today, encoding French as the only official language of Québec’s provincial government (whereas the federal Canadian government recognizes both English and French). Today, given the tumultuous history of English and French, the rise of other language groups (e.g., Arabic, Spanish), and increased economic and educational immigration, the Island of Montréal continues to experience distinct linguistic stratification and diversity (see Heller, 1982, for a review).

Participants

Fifty-four healthy bilingual adults aged 18 to 35 (Mage = 22.2 years, SDage = 3.0 years) living in Montréal, Canada, were recruited using English and French flyers, online advertisements, and word of mouth. All recruitment materials indicated that eligible participants must be proficient in English and French, the two official languages of Canada, though all experimental materials were in English. Primary demographic characteristics of this sample are provided in Table 1.

Table 1 Sample demographic statistics (N = 54)

Participants born outside of Canada were born in France (11), United States (3), China (2), Ivory Coast (2), Bangladesh (1), India (1), and Martinique (1). In addition to knowing both English and French, the sample reported knowledge of Bengali, Creole (Martiniquan, Mauritian), German, Gujurati, Hebrew, Hindi, Italian, Japanese, Mandarin, Mandinka, Russian, and Spanish. We also calculated general language entropy using the languageEntropy package in R to assess each participants’ overall balance (or diversity) of language use (Gullifer & Titone, 2019). The mean language entropy score of our sample was 0.85 (SD = 0.34), suggesting that on average, our participants regularly used multiple languages in a balanced manner (see Supplementary Materials for how language entropy is calculated).

Materials and procedure

All data were collected in the laboratory prior to the COVID-19 pandemic. Our outcome measures of interest were based on performance on the irony task. Individual differences in mentalizing capacity were computed from a mentalizing task (Tiv et al., 2021b, 2022b). Participants’ neighborhood language diversity was computed from Statistics Canada’s national census based on their postal code provided on the language history questionnaire. The order of the mentalizing task (approximately 20 minutes) and irony task (approximately 40 minutes) were counterbalanced across all participants. A short language history questionnaire, developed in our lab, was administered between the two tasks. Participants were also offered a short break before starting the second reading task.

Irony task

We used a reading and rating task to assess individual differences in irony comprehension (our outcome variable). In this task, participants read and rated 128 short stories in English. Each item began with an introductory sentence that was identical across conditions. This sentence introduced a task or activity involving the reader and a story character (e.g., “You and Henry play with your newly adopted cat”). The second sentence was a positive or negative scenario that followed the introduction (e.g., Positive: “The cat is energetic and playful for most of the day”; Negative: “The cat is sleepy and sluggish for most of the day”). The final sentence involved a positive or negative statement made by the story character to the reader (e.g., Positive: “What a lively cat”; Negative: “What a lazy cat”). This 2 (scenario) × 2 (statement) design resulted in four item conditions: literal compliment (positive scenario, positive statement), literal criticism (negative scenario, negative statement), ironic compliment (positive scenario, negative statement), and ironic criticism (negative scenario, positive statement). Examples of these conditions are available in Table 2.

Table 2 Example irony items

The average length of the scenario sentence was 10 words (range: 6–16). Within each itemset, the positive and negative conditions followed a similar structure, but they differed in the use of positively or negatively valenced words or phrases. In some items, these conditions differed by a single word (e.g., “the other runner finishes the race in [first]/[last] place”), whereas in others these conditions differed by a few words or a phrase (e.g., “the neighbor aims for the board and [breaks it cleanly]/[misses it entirely]”).

The average length of the statement sentence was also 10 words (range: 8–10). Within each itemset, the positive and negative conditions were identical except for the adjective in the statement. In the positive condition the adjective was positively valenced, and in the negative condition the adjective was negatively valenced. The positive and negative adjectives within each itemset were selected to be of similar lengths and frequencies, which was assessed from the CLEARPOND English corpus (Marian et al., 2012). The majority of statements took the following structure: “what a [adjective] [noun].” A handful of items were formatted as “that is [adjective] [noun]” or “someone is [adjective] today.” The same structure was adopted for all conditions within an item set.

In this task, participants were instructed to silently read each item and press the space bar on a keyboard upon comprehension. Then, participants made a series of three judgements for each item by pressing the corresponding number on a keyboard: naturalness, appropriateness, and irony (all 1 to 5; 1 = low, 5 = high).

Naturalness assessed how natural the wording of the statement seemed, and this question was used during preprocessing to filter out any items that may have been perceived in a grammatically unnatural way. Ironic compliments and criticisms were not matched on naturalness, and naturalness ratings for ironic criticisms were slightly greater than ratings for ironic compliments (MeanIronic Criticism = 3.45, SD = 1.42; MeanIronic Compliment = 2.65, SD = 1.50). Naturalness ratings for literal compliments and literal criticisms were also higher than those for ironic conditions (MeanLiteral Compliment = 4.16, SD = 1.16; MeanLiteral Criticism = 3.88, SD = 1.27). An analysis of variance (ANOVA) revealed a significant difference in naturalness ratings across the four conditions (F = 402.5, df = 3, p < .01). As past research has indicated, controlling for naturalness when comparing ironic and literal statements is not possible due to inherent differences in the frequency of ironic and literal language in daily discourse, which may render literal comments as more familiar and natural seeming than ironic ones (Țurcan & Filik, 2016). This is also the case when comparing different irony forms, as ironic criticisms are found to be much more common in North American discourse than ironic compliments (Clark & Gerrig, 1984; Katz, 2005; Sperber & Wilson, 1981). While it is possible that these differences affected the other ratings or response times, other work has shown that this may not be the case (Gibbs, 1986).

The appropriateness probe assessed the social acceptability of the statement, and perceived irony assessed how ironic the statement seemed to the reader. Response times to all three ratings were recorded. Participants were only shown each item in one of the four possible conditions, and the presentation order was randomized across all participants.

Mentalizing task

We used the inference task from Tiv et al. (2021a, b) to assess individual differences in mentalizing. In this task, participants read and rated 138 English sentence–pair item sets (see Table 3 for examples). Each item was composed of two sentences: the context and the action. The context sentence was unique across the three inference types and described a situation involving a character. In contrast, the action sentence was identical across the three inference types and described an action that was either related or unrelated to the first sentence, or context. This design yielded 552 unique sentences (414 unique contexts + 138 unique actions) with an average of 13 words across both sentences. All items were designed in three inference type conditions: mental state, logical, and incoherent (i.e., participants read 46 items in each condition).

Table 3 Example mentalizing items

The three inference type conditions varied in the type of inference needed to connect the context and action sentences: mental state, logical, or incoherent (i.e., no inference). In the mental state inference type, the context and action sentences could be connected by considering the thoughts, feelings, intentions, and beliefs of the story character (i.e., mental states). In contrast, the logical inference type relied less on an internal understanding of the story character’s mind and was more based in general, non-social deductions built from world knowledge of causality. The incoherent condition served as a baseline control for when no inference would aid in connecting the sentences. Critically, these three conditions only varied in the context sentence whereas the action sentence was identical.

For each item, participants were instructed to silently read the sentences for comprehension. Since inferences on the basis of mental states are first and foremost inferences (Harris, 2017), we first asked participants to rate the extent to which each item was linguistically coherent (1 to 5; 1 = no coherence, 5 = full coherence). This allowed us to dissociate general inferences based in deductive reasoning from inferences based in reasoning about mental states. From there, participants rated the extent to which the item relied on mentalizing, or an understanding of the story character’s mental states, emotions, intentions, goals, and beliefs (1 to 5; 1 = no mentalizing, 5 = full mentalizing). This rating was considered the primary outcome of interest in assessing mentalizing capacity.

In previous work (Tiv et al., 2021b, 2022b), bilingual participants in Montréal, Canada, were accurate in rating items in the mental state condition higher in mentalizing than items in the logical condition. To ensure participants in the present study were also accurately rating the items, we computed mean mentalizing ratings to the three inference type conditions. As shown in Fig. 1, our sample of participants accurately identified mentalizing for mental state inferences, as compared to the other two inference types. Thus, mentalizing ratings were effective at capturing inferences on the basis of mental states.

Fig. 1
figure 1

Mentalizing task. a Mean ratings to the three inference type conditions plus/minus one standard error of the mean. The means and standard deviations are as follows: incoherent = 1.71 (1.36), logical = 2.35 (1.48), mental state = 3.75 (1.40). b The distribution of the mentalizing difference score used as an individual difference predictor

We tested the internal consistency of this task to ensure items were measuring the same thing. Cronbach’s alpha on the mentalizing ratings of all items was α = 0.93 (95% confidence interval: α = [0.91, 0.96]), which indicates high covariance across the itemset. This means the items are highly likely to be measuring the same underlying concept. We also computed a trial-by-trial mixed-effects linear regression of inference type on mentalizing ratings. This model, much like prior analyses of this task (Tiv et al., 2021b, 2022b), included random effects by-item and by-subject. As expected, the by-item random variance was low (0.06), particularly when compared to the by-subject random variance (0.24). This result offers additional evidence that the variance between items was not substantial.

Given that our primary interest in this work was to assess individuals’ mentalizing capacity, we calculated a difference score between the mental state and logical conditions on the mentalizing rating. To do this, for each participant we averaged their mentalizing ratings for all mental state condition items and all logical condition items. Then, we subtracted the mean logical condition score from the mean mental state condition score to procure the mentalizing difference score. A mentalizing difference score of zero indicates that the participant equally associated mentalizing to the mental state and logical conditions, thus demonstrating poor discernment for the specific cases when mentalizing is needed. A positive mentalizing difference score indicates that the participant associated more mentalizing to the mental state condition, whereas a negative mentalizing difference score indicates that the participant associated more mentalizing to the logical condition. Thus, our mentalizing difference score indicates the extent to which a participant selectively and accurately discerned the need for mentalizing to only the mental state condition. Figure 1 demonstrates the distributed variance in this difference score, which suggests it may be well suited as a metric of individual difference.

Language history questionnaire

All participants completed a brief language history and demographic questionnaire, which took approximately ten minutes to complete. This survey probed basic aspects of their identity (e.g., age, gender, ethnicity/race, education) and language experience (e.g., all known languages, daily use of each language, age of acquiring each language). This survey also gathered the current residential postal code of each participant, which we used to compute a neighborhood ecological index of language diversity from the Canadian Census.

Computing ecological language diversity

We used Statistics Canada’s 2016 Canadian Census Profile to examine language use in neighborhoods inhabited by our participant sample (Statistics Canada, 2016). This demographic survey is distributed to all Canadian households every 5 years and contains 100% data, which Statistics Canada describes as meaning that “data were collected for all unites (dwellings) of the target population, therefore no sampling is done.” Institutional residents, or “a person, other than a staff member and his or her [their] family, who lives in an institution, such as a hospital, a nursing home, or jail,” were excluded from the population.

All census responses are tagged with the respondent’s residential Forward Sortation Area (the first three digits of the postal code). Thus, we matched the first three postal code digits of our participant sample to the population statistics collected by Statistics Canada for that same neighborhood. We calculated the language diversity of each Forward Sortation Area using Wilcox’s (1973) Index of Qualitative Variation (IQV), formalized below, to quantify the heterogeneity, variability, dispersion, or diversity in respondents’ mother tongue across the 182 possible language categories outlined on the census:

$$IQV=1-\frac{\sum_{i=1}^k\left({f}_m-{f}_i\right)}{N\left(K-1\right)}.$$
(1)

In this formula, fi is the frequency of the ith category, fm is the frequency of the modal category, N is the number of cases, and K is the number of categories. The Language Diversity index is bound between 0 and 1, where low language diversity scores (close to 0) indicate a neighborhood that is heavily dominant in one language as the mother tongue. In contrast, a high language diversity score (close to 1) indicates multiple pervasive mother tongues in the neighborhood. Language Diversity scores across all residential Forward Sortation Areas of the Greater Montréal region are illustrated in Fig. 2. This map highlights the range of language diversity scores across this region.

Fig. 2
figure 2

Language diversity across Montréal. Note. Figure adapted from Tiv et al. (2022a). Darker colors indicate greater language diversity scores

Results

Data preprocessing

Prior to data analysis, we implemented a series of planned preprocessing steps to the irony task data, which closely matched those conducted for the mentalizing task in Tiv et al. (2021a, b). These steps left us with the ratings of all 54 participants and the response times of 50 participants. Details regarding these steps and their justification can be read in Supplementary Materials.

Data analysis

Our design yielded seven possible dependent variables, but we focused our primary analysis on the actual ratings of the items (appropriateness and perceived irony). For the primary analysis, response time data was not examined given the possibility of unintended noise arising from the experimental design. This design instructed participants to input their numerical ratings on a conventional keyboard where the time to press different numbers would vary. Additional noise may have been added to the response times because participants were asked to make several back-to-back ratings to the same item probe and may have been thinking about one rating as they responded to the other one. For these reasons, while we report the overall response time results in the manipulation check section, we do not report response times as a function of mentalizing capacity and neighborhood language diversity in the manuscript. These results can be found in the Supplementary Materials and should be interpreted with caution.

Moreover, the naturalness probe was intended to be used for preprocessing purposes (i.e., filtering out hard to understand items). Therefore, naturalness ratings were not analyzed or used to answer our research questions.

All data were analyzed with linear mixed-effects regression models using the lme4 package in R (Bates et al., 2015). Following Matuschek et al. (2017), all models included maximal random intercepts and slopes by-subject and by-item. In cases of nonconvergence, the random slope contributing the least variance was dropped to reach convergence (Barr et al., 2013). For the manipulation check, response times were log-transformed for normality. For all models, we calculated pseudo R2 with the ‘r.squarred.GLMM’ function from the MuMIn package in R (Bartoń, 2020), and 95% confidence intervals were calculated using the ‘confint’ function with the Wald’s method from the stats package. Analysis code is available on the Open Science Framework (https://osf.io/pszv7/).

Manipulation check

Since the experimental itemset was created for this study, we first modelled our core manipulation on appropriateness rating, appropriateness response time, perceived irony rating, and perceived irony response time. Across these four models, both scenario and statement were treatment coded since both consisted simply of two levels (0 = positive, 1 = negative). For all four dependent variables, we detected a significant interaction between Scenario and Statement, suggesting the core manipulation targeted positive and negative ironic and literal statements, as intended (Fig. 3). Means and standard deviations for appropriateness and perceived irony ratings to each condition are available in Table 4.

Fig. 3
figure 3

Manipulation check results. Note. Linear mixed-effects model-predicted appropriateness rating (a), perceived irony rating (b), appropriateness response time (c), and perceived irony response time (d). While response time was log-transformed in the models, raw response times are illustrated in the figure. Linetype and color (redundant coding) represents item condition. Standard error bars represent plus or minus one standard error of the mean. (Color figure online)

Table 4 Irony task appropriateness and perceived irony summary statistics

As expected, appropriateness rating differed across all four conditions: literal compliments were rated highest in appropriateness (intercept = 4.45, SE = 0.07, t = 62.34, p < .001, 95% CI [4.31, 4.59]), and ironic compliments were rated lowest in appropriateness (B = −2.37, SE = 0.07, t = -34.00, p < .001, 95% CI [−2.51, −2.23]). Literal criticisms were rated less appropriate than literal compliments but more appropriate than either form of irony (B = 3.36, SE = 0.05, t = 62.63, p < .001, 95% CI [3.25, 3.46]). Ironic criticisms were rated more appropriate than ironic compliments but less appropriate than either literal statement (B = −1.60, SE = 0.07, t = −22.32, p < .001, 95% CI [−1.74, −1.45]). The pseudo R2 of this model was 0.51 (conditional) and 0.33 (marginal).

Appropriateness log-response time for literal compliments was faster than any other condition (intercept = 7.11, SE = 0.066, t = 108.47, p < .001, 95% CI [6.98, 7.24]). Appropriateness response times to ironic criticisms (B = 0.37, SE = 0.024, t = 15.31, p < .001, 95% CI [0.32, 0.42]) and ironic compliments (B = 0.36, SE = 0.024, t = 15.03, p < .001, 95% CI [0.32, 0.41]) were equally slow. The pseudo R2 of this model was 0.22 (conditional) and 0.02 (marginal).

Pereceived irony ratings to literal compliments (intercept = 1.24, SE = 0.05, t = 23.31, p < .001, 95% CI [1.14, 1.35]) and literal criticisms (B = −5.65, SE = 0.05, t = −108.89, p < .001, 95% CI [−5.75, −5.55]) were both lower than to either form of irony. Perceived irony ratings were higher to ironic criticisms (B = 3.14, SE = 0.04, t = 85.52, p < .001, 95% CI [3.07, 3.21]) than ironic compliments (B = 2.63, SE = 0.05, t = 48.36, p < .001, 95% CI [2.52, 2.74]). The pseudo R2 of this model was 0.66 (conditional) and 0.60 (marginal).

Perceived irony log-response time was fastest for literal compliments (intercept = 7.18, SE = 0.062, t = 115.44, p < 0.001, 95% CI [7.05, 7.30]) and slowest for ironic compliments (B = 0.32, SE = 0.023, t = 13.98, p < .001, 95% CI [0.28, 0.37]). Perceived irony response time for both ironic criticisms (B = 0.21, SE = 0.023, t = 9.21, p < .001, 95% CI [0.17, 0.26]) and literal criticisms (B = −0.32, SE = 0.032, t = −9.85, p < .001, 95% CI [−0.38, −0.26]) were slower than literal compliments but faster than ironic compliments, though not different from each other. The pseudo R2 of this model was 0.21 (conditional) and 0.01 (marginal).

The role of mentalizing and ecological language diversity

For our primary analysis, we added two individual difference variables to our initial models: mentalizing difference score (continuous, scaled) and neighborhood language diversity (continuous, scaled), resulting in a four-way Scenario × Statement × Mentalizing × Diversity interaction (Fig. 4). The mentalizing difference score and neighborhood language diversity were slightly correlated (r = −.36), but this value was below conventional correlational thresholds (Berry et al., 1985; Mela & Kopalle, 2002) and thus did not qualify concerns for multicollinearity. Summary statistics for these predictor metrics are provided in Table 5.

Fig. 4
figure 4

Individual difference results. Note. Linear mixed-effects model-predicted appropriateness (a) and perceived irony (b) rating results. Linetype and color (redundant coding) represents item condition. Left panel illustrates low-language-diversity neighborhoods, and right panel illustrates high-language-diversity neighborhoods (language diversity was binned for illustrative purposes). Standard error bands represent plus or minus one standard error of the mean. (Color figure online)

Table 5 Predictor summary statistics

In these models, scenario and statement were deviation coded (−0.5 = positive, 0.5 = negative) to more easily interpret higher order interactions. We strove to model a maximal random effects structure (Matuschek et al., 2017), but in cases of convergence issues we dropped random slopes explaining the least variance (Barr et al., 2013).

The model for appropriateness rating detected a significant four-way Scenario × Statement × Mentalizing × Diversity interaction (B = −0.52, SE = 0.07, t = −7.36, p < .001, 95% CI [−0.65, −0.38]). A follow-up test on only positive scenarios indicated that appropriateness ratings for ironic compliments increased in comparison to literal compliments as mentalizing difference scores and neighborhood language diversity also increased (B = 0.33, SE = 0.05, t = 7.29, p < .001). Similarly, a follow-up test on only negative scenarios indicated that appropriateness ratings for ironic criticisms increased in comparison to literal criticisms as mentalizing difference scores and neighborhood language diversity also increased (B = 0.18, SE = 0.05, t = 3.34, p < .001). A third follow-up test indicated no difference in appropriateness ratings to ironic compliments and ironic criticisms. Taken together, as personal mentalizing and neighborhood language diversity increased, both forms of irony were rated as more appropriate compared to literal statements (Fig. 4). The pseudo R2 of this model was 0.51 (conditional) and 0.37 (marginal).

Our model for perceived irony ratings also detected a significant four-way Scenario × Statement × Mentalizing × Diversity interaction (B = −0.26, SE = 0.01, t = −3.90, p < .001, 95% CI [−0.39, −0.13]). A follow-up test on only positive scenarios indicated that perceived irony ratings for ironic compliments increased in comparison to literal compliments as mentalizing difference scores and neighborhood language diversity also increased (B = 0.13, SE = 0.05, t = 2.81, p = .005). Similarly, a follow-up test on only negative scenarios indicated that perceived irony ratings for ironic criticisms increased in comparison to literal criticisms as mentalizing difference scores and neighborhood language diversity also increased (B = 0.13, SE = 0.05, t = 2.92, p = .004). A third follow-up test indicated that the increase in perceived irony to ironic compliments was greater than to ironic criticisms (B = 0.12, SE = 0.05, t = 2.31, p = .02). Together, as personal mentalizing and neighborhood language diversity increased, perceived irony, particularly to ironic compliments, increased compared to literal statements (Fig. 3). The pseudo R2 of this model was 0.67 (conditional) and 0.61 (marginal).

We tested a series of checks to ensure these patterns of results were specific to the effects under study. To do this, we included several control variables in each model (Supplementary Materials). First, we checked if the role of individual differences in mentalizing capacity was specific to inferences on the basis of mental states, as opposed to a more general ability to detect linguistic coherence. We added each participant’s average coherence ratings to the mental state condition from the mentalizing task to the model. This did not change the significant interaction that was detected. Next, we accounted for individuals’ own daily exposure to multiple languages by adding covariates for personal language entropy (continuous, scaled) and percentage daily English use (continuous, scaled). Again, we did not detect any change to the significant interaction of interest. Finally, we ensured the ecological influences of language diversity were not confounded by other sociodemographic characteristics of the neighborhood. Thus, we added a covariate for neighborhood-level socioeconomic status (continuous, scaled), as measured through percentage of low-income households in each Forward Sortation Area from the 2016 Census Profile. This did not change the significant interaction between mentalizing capacity and neighborhood language diversity on irony ratings.

Discussion

This paper examined the relationship between two sources of individual differences among bilinguals, mentalizing capacity and neighborhood linguistic diversity, in constraining perceptions of irony. Our results indicated that individual differences in mentalizing capacity and the linguistic diversity of one’s residential neighborhood interactively predict irony processing across both appropriateness and perceived irony ratings. Specifically, demonstrating greater mentalizing capacities on an inference task and living in linguistically diverse areas were associated with finding both ironic criticisms and ironic compliments more appropriate, which suggests these individuals may have had greater insight on the pragmatic and social communicative functions of irony. Additionally, perceived irony ratings of ironic compliments also increased with mentalizing and neighborhood language diversity, suggesting familiarity with nonprototypical irony forms. Lastly, these effects were robust to the addition of several control variables, including individual differences in generating non-mental state-based inferences, individual differences in language proficiency, and neighborhood socioeconomic status. These results paint a complex, yet nuanced, portrait of how individual cognition and social context dynamically interact to shape pragmatic language comprehension.

Ironic criticisms vs. ironic compliments

We first conducted a manipulation check to assess the sensitivity of our items and task to the evaluative probes. Specifically, we tested for group-level asymmetries of affect, or differences in ironic criticisms vs. ironic compliments. Empirical evidence from monolingual or presumed monolingual samples have shown that ironic criticisms and ironic compliments vary in their social appraisals (e.g., perceived politeness) and comprehension (e.g., making sense; Dews & Winner, 1995; Katz et al., 2013; Kreuz & Link, 2002; Pexman, 2008; Pexman & Olineck, 2002). Our results demonstrate similar patterns among a bilingual sample (see also Tiv et al., 2020).

Ironic criticisms were rated as more appropriate than ironic compliments, which suggests discourse context alters social appraisals of irony. Interestingly, while ironic compliments were rated as less appropriate than literal compliments, ironic criticisms were not rated as more appropriate than literal criticisms. Although this pattern differs from some theoretical predictions (e.g., the tinge hypothesis), it is consistent with other empirical findings, including those from monolingual samples (Joergensen et al., 2021). Response times to ironic criticisms and ironic compliments did not differ, but they were slower than to literal compliments. These results may indicate that readers took more time to reflect on the social appropriateness of ironic statements, although they may have also been affected by differences in naturalness scores. However, such results should be interpreted with caution given the potential noise in how response times were measured.

We discovered a similar pattern for perceived irony, such that ironic criticisms were rated as more ironic than ironic compliments. This finding is consistent with predictions cast by the parallel constraint-satisfaction model that increased activation of ironic criticism interpretations over time become more prototypical of verbal irony (see additional affectively asymmetric models; Clark & Gerrig, 1984; Sperber & Wilson, 1981). Again, while our response time results corroborated this account—irony ratings for ironic compliments were slower than those for ironic criticisms—we acknowledge the potential for added noise in our measurement. We interpret these patterns as evidence that our task and itemset were sensitive to the social and linguistic features of verbal irony. Additionally, these results suggest that English–French bilinguals, much like English monolinguals, integrate discourse context in their evaluations and responses to ironic language.

Our primary analysis examined whether ironic criticisms and compliments differentially patterned with mentalizing capacity and neighborhood language diversity. This was the case for perceived irony ratings, as the interaction between mentalizing and neighborhood language diversity was more strongly predictive of changes in perceived irony ratings to ironic compliments than to ironic criticisms. Conversely, ironic criticisms and compliments did not differ in appropriateness when considering the interactions. It may be that when participants were explicitly instructed to focus on irony ratings, they were more perceptive of subtle differences between the conditions (similar results reported in Kreuz & Link, 2002; Tiv et al., 2020). It is also possible that when rating the appropriateness of the statements, which is based in the social acceptability of the utterance, participants conceptualized all ironic remarks within one class.

Mentalizing as an adaptive cognitive response to ecological demands

A large body of research has shown that mentalizing may be a core underlying capacity of irony processing. For instance, the parallel constraint-satisfaction framework of irony comprehension implicates speakers’ and listeners’ capacities to infer others’ mental states in successfully understanding irony, and empirical results from healthy adults reveal mentalizing constrains irony processing (Ivanko et al., 2004; Kaakinen et al., 2014; Pexman, 2008; Pierce et al., 2010). Accordingly, children who have not yet fully acquired mentalizing abilities are also not able to understand irony (Filippova & Astington, 2008; Pexman, 2008; see also Nicholson et al., 2013, for a link between children’s irony processing and empathy). Other evidence from clinical populations, including individuals with autism spectrum disorder, Parkinson’s disease, or schizophrenia, and older adults with mild cognitive impairments or damage to the prefrontal cortex show challenges with both mentalizing and irony comprehension (Gaudreau et al., 2015; Happé, 1993; Langdon et al., 2002; Monetta et al., 2009; Shamay-Tsoory et al., 2005). Still, generalizing findings from clinical to healthy populations is challenging, and other studies do not find a link between mentalizing and irony in clinical populations (Angeleri & Airenti, 2014; Bosco & Gabbatore, 2017; Panzeri et al., 2020; Rapp et al., 2013).

The results from this study provide evidence for an association between mentalizing and irony comprehension, which critically depends on ambient patterns of sociolinguistic context. In neighborhoods of low linguistic diversity, mentalizing is associated with decreased appropriateness ratings to ironic statements, but this same capacity is associated with increased appropriateness ratings to ironic statements in neighborhoods of high linguistic diversity.

We may understand these patterns by thinking about the unique cognitive opportunities and demands afforded by diverse ecologies. A growing body of research by Devos, Sadler, and colleagues (Devos & Sadler, 2019; Sadler & Devos, 2020; Sadler et al., 2021) shows that geographic areas with high racial diversity are associated with less implicit bias to racial and ethnic minorities. Some plausible reasons for these patterns include greater intergroup contact, more variance in perceptual input, and a generalized understanding that people can differ on many attributes. As language is also a perceptual source of variance, diversity on the basis of language may provide similar opportunities for intergroup contact and variation in input. For instance, someone who lives in a linguistically diverse neighborhood may be more likely to encounter people with different life experiences and perspectives than a person living in a linguistically homogenous neighborhood. Additionally, engaging in interactions in different languages or overhearing conversations with unique linguistic styles can accumulate variation in linguistic input and may broaden one’s understanding of language itself and the people who speak it. Recent work from psycholinguistics and organization psychology has been investigating the cognitive consequences of experiencing diversity and input variation. This work has found that in some cases iterated exposure to diverse information can generalize to flexible social categorization and enhance understanding of unique perspectives (Carter & Phillips, 2017; Crisp & Turner, 2011; Lev-Ari & Sebanz, 2020). Indeed, one may benefit from these consequences by simply being immersed in a diverse environment without necessarily needing to engage in cross-group interactions themselves.

Moreover, mentalizing itself is a flexible cognitive process, such that it can be “stretched” and “withheld” to adaptively respond to situational demands (reviewed in Harris, 2017, pp. 6–10), and resolve ambiguity in intent. Thus, working at the intersection of these findings and theoretical approaches, it is possible that mentalizing is more adaptive in linguistically diverse contexts than in homogenous ones because it can aid in resolving differences in perspective. In contrast, engaging in mentalizing in homogenous contexts, where perspectives may not meaningfully differ, could be inefficient as this process may require the coordination of several cognitive resources. This can explain why in neighborhoods of high linguistic diversity, considering the minds of others is associated with a greater understanding of the social communicative functions of irony, as indexed by greater appropriateness ratings. It may be the case that in these contexts, individuals’ capacity to mentalize is adaptive to the diversity reflected in the environment. In contrast, mentalizing to understand others may not be adaptive in neighborhoods with low linguistic diversity, as there are fewer perspective differences to overcome, resulting in less appropriate ratings to ironic statements. Together, these patterns underline the interactive nature of individual and environmental traits in shaping social perceptions to ironic language. It is possible this dynamic interplay is not specific to irony comprehension and may apply more broadly to other forms of ambiguous language, which is for future research to explore.

Individual vs. ecological sources of variation

While the focus of this paper has been on ecological sources of language variation, a natural question following these findings may be how ecological language dynamics relate to an individuals’ own language experiences. Many, though not all, individuals self-select where they live, and these locational decisions are based on similar social evaluations as when forming new personal acquaintances (Johnston & Pattie, 2015). While we did not collect social network information from the sample, we did assess social diversity in participants’ own language use through general language entropy (Gullifer & Titone, 2019), which provides a coarse index for how balanced their language use is. We did not find that personal language entropy correlated with neighborhood language diversity (r = −.05), but there was a positive correlation between percent of daily conversations in English and neighborhood language diversity (r =.43). Both personal language use variables were tested as covariates in the models, but the effects of neighborhood language diversity were still detected. Furthermore, substituting neighborhood language diversity with personal language entropy in the interaction term of each model did not yield a statistically significant result.

These follow-up tests further support the idea that, in certain situations, ambient, environmental exposure can be equally if not more important than direct, personal experiences (Crisp & Turner, 2011). Fan et al. (2015) and Liberman et al. (2017) reported similar findings on these environmental characteristics. Both papers reported that mere exposure to language diversity enhanced performance on a perspective-taking task, with no difference between children who personally knew multiple languages and children who were merely exposed to multiple languages in the social environment. Bice and Kroll (2019) also found evidence for ambient linguistic exposure in shaping acquisition of a novel language. Others have experimentally demonstrated that other neighborhood characteristics, such as socioeconomic status (Rickford et al., 2015) or racial diversity (Devos & Sadler, 2019; Sadler & Devos, 2020; Sadler et al., 2021) directly related to changes in pragmatic language use and implicit social cognitions. In our study, the significant effect of neighborhood language diversity was detected controlling for neighborhood socioeconomic status. This suggests that exposure to diverse perspectives, experiences, and languages helps in effectively interpreting the intended meaning behind an ironic statement.

We do not expect this pattern to be exclusive to language diversity but rather that language identity is a commonly observable source of individual variation that cues an interlocuter to differences in perspective. For example, neighborhood language diversity slightly correlated with proportion of ethnic minority individuals in each neighborhood (r = .28), and past research has found that presence of minorities encourages deeper, more detailed, and less stereotypic information processing (Crisp & Turner, 2011; Devos & Sadler, 2019; Sadler et al., 2021; Sadler & Devos, 2020). Our results suggest that chronic exposure to linguistic diversity may nudge people towards processing styles that collectively lead to similar benefits.

Limitations

We acknowledge several limitations of this study. The size and demographic composition of our sample may have limited our results. Given the sociolinguistic context of Montréal, Canada, we recruited English–French bilinguals, most of whom were predominantly white. We plan to expand beyond this scope in order to draw more general inferences about how diverse bilinguals situated in unique locales may perceive irony. We also encourage others to investigate the patterns found in this paper among larger and more diverse samples.

Additionally, due to methodological decisions in having participants make back-to-back ratings on a standard keyboard, our response time results were coarse which may have prevented us from detecting subtle individual differences in processing. Our ongoing work leverages eye-tracking as a precise measure of real-time cognitive processing to address this limitation.

Lastly, the correlational nature of our individual differences analysis precludes us from concluding the directionality of our results. For instance, it is possible that individuals who possess certain traits (e.g., appreciation for irony, cognitive flexibility, social status) elect to reside in demographically diverse neighborhoods. While past experimental research (e.g., Rickford et al., 2015) has affirmed that pragmatic language variations directly result from neighborhood characteristics, this relationship may to some extent be bidirectional. We believe longitudinal data, which can track individuals’ migration patterns, will be particularly insightful for understanding the emergent relationship between environmental dynamics and individual cognition.

Conclusion

We integrated theoretical frameworks of ironic language with empirical findings of individual differences in bilingual experience and an ecological perspective to assess irony comprehension in bilingual adults. In doing so, we found that individual differences in mentalizing capacity and neighborhood language diversity interactively predicted appropriateness and perceived irony ratings to ironic statements. Our results support burgeoning frameworks of cognition as a socially-contextualized set of processes (López et al., 2021; Titone & Tiv, 2022; Tiv et al., 2022a; Wigdorowitz et al., 2020), in which even ambient environmental attributes contribute to an individual’s understanding of language. We encourage future research to adopt a socially contextualized perspective on human cognition and the implications it has in the real world.