Language-dependent emotions in heritage and second language bilinguals: When physiological reactions deviate from feelings

Aims and objectives: Whether bilinguals show language-dependent emotions often depends on the emotion measure used. Here, we examine if differences between automatic pupil reactions and self-reported feelings in response to an emotional narrative presented in a first, second, or heritage language (HL) indicate different stages of emotion processing. Methodology: German HL speakers of Russian and Turkish (n = 72) and German second language (L2) speakers of English and French (n = 89) listened to a video-based emotional narrative in German or their other language and rated how they felt about it (arousal and valence). We contrasted pupil diameter during the video with a language-specific baseline. Age of acquisition, language use frequency in emotional contexts, and language proficiency were used to verify that HL speakers were balanced simultaneous and L2 speakers unbalanced sequential bilinguals. Data and analysis: Linear mixed-effects models were fitted to the pupillometry data and ordinal logistic models to the self-report data. Findings and conclusions: HL speakers showed similar automatic reactions in both languages but rated the German narrative less emotional. L2 speakers showed weaker automatic reactions in L2 yet rated the narrative similar in both languages. This reversed pattern confirmed that automatic and conscious emotion measures tap into different stages of bilingual emotion processing. Furthermore, language-dependent emotions in self-reports seem to be linked to sociocultural frames that go beyond the scope of context and processing-based explanations. Originality: The study is among the first to systematically examine discrepancies between automatic and conscious measures of bilingual language-dependent emotions with different types of bilinguals and within one experimental paradigm. Significance: The findings imply that theories of bilingual emotions need further development to explain consistently and explicitly why language-dependent emotional reactions vary with bilingualism and emotion measures. Methodologically, the findings advocate for multi-measure approaches to enhance the validity of future research.


Introduction
Bilinguals' emotional reactivity can depend on the language used to verbalize emotional situations.This observation has been conceptualized against the background that language co-constructs emotions by providing labels as "happiness," "anger," and "fear" to categorize and communicate diffuse affectual sensations as emotional experiences (Lindquist et al., 2015).The emotional outcome of linguistic co-construction can vary because languages differ in their emotion mapping and vocabulary (e.g., Jackson et al., 2019), and most bilinguals' lexical knowledge, processing skills, and language use are not equivalent in their languages.Empirically, however, the evidence for language-dependent emotions varies across emotion measures (see Table 1).Automatic, neurophysiological, and cognitive measures of emotion processing provide relatively consistent support for differential and usually weaker reactions to emotion stimuli verbalized in the weaker second language (L2), in contrast to the dominant and more proficient first language (L1)-at least for negative or intense stimuli.In contrast, conscious self-reports of perceived emotionality along basic affectual dimensions (arousal and valence) typically suggest similar reactions in L1 and L2.Table 1 summarizes the key characteristics of 11 multi-measure studies that combined neurophysiological or cognitive measures with (meta-cognitive) self-report measures of emotion.Eventrelated potentials, electrodermal skin conductance responses, pupillometry, and lexical processing mostly confirmed either a main effect of language or a language-emotionality interaction, such that reactions were weaker in L2 or the differences between stimuli of variable arousal/valence were less pronounced in L2, relative to L1.Yet, self-reported feelings on verbal or emoticon scales showed no or opposite effects of language within the same experiments.
The lack of alignment of findings derived from different emotion measures is a common observation in emotion research (Mauss et al., 2005;Mauss & Robinson, 2009), and it has been recognized in the earliest multi-measure approaches to bilingual emotions (e.g., Anooshian & Hertel, 1994).An additional challenge in bilingual emotion research is that self-reported feelings in multimeasure studies often do not differ between languages (see Table 1, last column), whereas singlemeasure self-report studies typically find that L1 feels "more emotional" than L2 (e.g., Dewaele, 2016;Garrido & Prada, 2021;Pavlenko, 2012;Puntoni et al., 2009).Therefore, it is unclear if the measurement inconsistency represents methodological issues or conceptual differences.
Methodologically, several authors have argued that self-reports of emotion are unreliable, in particular, if they include language themselves (de Langhe et al., 2011;Harris et al., 2006).Furthermore, inter-experimental comparisons are difficult because most studies differ in the types of bilinguals sampled and the emotion tasks used (see Table 1).Several studies tested immigrant populations who lived in an L2 environment and sometimes had acquired both their languages in early childhood (e.g., Anooshian & Hertel, 1994;Caldwell-Harris et al., 2011;Harris et al., 2006).In these contexts, the L2 was the majority language in society and the L1 the minority language used at home, which can then be understood as heritage language (HL, Montrul, 2008).Others worked with populations who had learned an L2 as a foreign language in school (e.g., Iacozza et al., 2017;Thoma & Baum, 2019;Winskel, 2013).
There are at least three theoretical approaches explaining language-dependent emotions (see also Williams et al., 2020).First, context theory assumes a strong association between emotional learning and habitual language use contexts.As L2s are typically learned later and used in less emotional contexts, they will be associated with less intense emotionality than L1s (Ayçiçeği-Dinn & Caldwell-Harris, 2009;Brase & Mani, 2017;Iacozza et al., 2017).The approach emphasizes the joint age of acquisition (AoA) of emotion and language.It therefore predicts that L1 stays the most emotional language in a bilingual's life even if they start using L2 in similarly emotional contexts and become more proficient.Second, processing theory assumes that as L2 processing is less efficient and poses a higher cognitive load, it will hinder emotion processing, so that emotional reactions to/in L2 are less intense at a particular time (Opitz & Degner, 2012;Sianipar et al., 2015;Thoma & Baum, 2019).Accordingly, language-dependent emotion differences will diminish with increasingly balanced proficiency in both languages.Note that context and processing-based theories are interconnected because if an L2 is acquired earlier and used more frequently, it is more likely to coincide with emotional experiences, and its processing becomes more automatic (Pavlenko, 2012).Third, cultural frame-switching theory presumes that specific languages evoke socioculturally motivated patterns of behavior because they are associated with different sociocultural expectations (e.g., Panayiotou, 2004;Ross et al., 2002;Zhou et al., 2021).If these expectations involve affect, for example, in that Chinese-English bilinguals in the United States report more positive mood when asked in English than Chinese (Ross et al., 2002), the theory is also called emotional frame switching (Zhou et al., 2021).The approach resembles context theory in its emphasis on experience.However, even if context theory expects equivalence because bilinguals have acquired both languages simultaneously and use them in similar contexts comparably often, frame switching could accommodate language-dependent emotions provided a language evokes specific frames associated with the status of itself or the speech community (He, 2010).Such framing is already reflected in the ideologically biased but psycholinguistically vague term native language (Dewaele et al., 2021).Accordingly, language-induced emotion differences should be most pronounced at the level of conscious self-reported feelings than in automatic emotional reactions (Levenson et al., 2010), and if a bilingual's languages can trigger different frames-which has been shown for majority and HLs (e.g., Noriega & Blair, 2008).
Against this background, the present research investigated if differential language-dependent emotion effects inferred from an automatic measure of emotional responsiveness (here pupillometry) and self-consciously rated feelings (here arousal and valence icon scales) are methodological noise or indicative of different stages of bilingual emotion processing.

The present study
The aim of this study was to compare balanced simultaneous and unbalanced sequential bilinguals' automatic versus conscious emotional reactions to an emotional narrative told in different languages.We operationalized bilingualism in terms of AoA, language use frequency, and proficiency (Treffers-Daller, 2019).The balanced simultaneous bilinguals should be HL speakers.Due to migration, Turkish and Russian are the most frequent heritage (or minority) languages in Germany (Olfert & Schmitz, 2017).Communities with a Turkish HL background typically consider the HL their family language and have a positive attitude toward it (Bayram & Wright, 2017).Even though the situation is more variable within the Russian HL community, members who maintain Russian as a functional minority language also tend to value the language as part of their identity (for review, Olfert & Schmitz, 2017).The subsample of more unbalanced sequential bilinguals should use German as their first language (L1) and either English or French as a second language (L2) predominantly in instructional, educational, or work contexts.When comparing it with German, we refer to the other language (HL or L2) as Lx.
We used pupil dilation during listening to the narrative as an automatic measure as it reflects unbiased reactions of the sympathetic nervous system to emotional stimuli (Bradley et al., 2008).Furthermore, it allows for more authentic emotion tasks than decision-based reaction time experiments and is less invasive than measuring skin or brain activity (Iacozza et al., 2017;Thoma & Baum, 2019).
Crossing these types of bilinguals and emotion measures allowed us to test conflicting predictions made by the three theories of language-dependent emotions.For HL speakers, context theory predicts that they show equal emotional reactions in German and Lx, provided the AoA and relevant language use contexts are similar.Processing theory also expects similar reactions from HL speakers, yet for the different reason that comparable language processing fluency enables comparable emotion processing.Frame-switching theory allows for language-dependent emotions even if AoA, emotion-relevant language use, and processing skills in German and Lx are comparable, but one language fits better with emotion-relevant expectations.
For unbalanced sequential L2 speakers, context theory predicts stronger automatic responses in German than in Lx because the earlier AoA of German in emotional contexts leads to stronger bodily "emotional resonance."In principle, the automatic L1 emotion advantage should permeate to the subsequent conscious level of feelings.Note that authors who used context-based explanations and found incongruent automatic and conscious emotion differences have only provided methodological explanations, if any, for inconsistencies (e.g., Anooshian & Hertel, 1994;Caldwell-Harris & Ayçiçeği-Dinn, 2009;Iacozza et al., 2017).Processing theory assumes stronger automatic emotional reactions in L1 because semantic processing is less fluent in Lx.Yet, at the conscious rating level, language differences may disappear because language and emotion processing are sensitive to time and attention regulation (Opitz & Degner, 2012;Thoma, 2021).It is difficult to say if frame-switching induces automatic emotional reactions, but the theory certainly allows for language-dependent feelings because any language can trigger different attitudes and expectations (e.g., Edwards & Fuchs, 2018;Nederstigt & Hilberink-Schulpen, 2018).In sum, at least context and processing theory jointly lead to hypothesis H1: H1: Balanced simultaneous bilinguals are less sensitive in their automatic pupillary reactions when exposed to a German versus Lx emotional stimulus than unbalanced sequential bilinguals are.
In contrast to context theory that does not expect variation between language-dependent automatic and self-conscious emotions, processing theory suggests that slower, more controlled, and deliberate processing reflected in self-reported feelings might modulate early automatic languagedependent emotion differences.Frame-switching theory predicts that language-induced differences are more distinct if languages are associated with distinct sociocultural values or attitudes.In conjunction, we, therefore, hypothesize: H2: Balanced simultaneous bilinguals report larger differences in language-dependent feelings than unbalanced sequential bilinguals do.

Method
Participants.The study included 161 university students who received €10 as a thank you.They were from four different bilingual populations living in an urban area in Germany and proficient in the majority language German.A group of Russian-German and Turkish-German bilinguals were from second-or third-generation immigrant families.Furthermore, a group of German-English and German-French bilinguals had initially learned English and French as foreign languages.We report the biographic and bilingualism statistics of the four subsamples at the beginning of the "Results" section.
Materials.We used a 10-second street traffic surveillance video as a baseline for pupil size under unemotional yet language-dependent processing load.Its audio track contained a traffic update stating there were no special incidents in the region without using emotion or emotion-laden words.The emotional narrative was presented as a high-arousal 58-second video called Hochzeitstag ("Wedding Day," Minghella, 2006) previously used by Thoma and Baum (2019).It features an elderly lady in a cemetery who receives flowers from a delivery service her late husband had preordered for their wedding day.The narrator reads an accompanying letter in the voice of the husband.The commercial won several awards for transporting love, dignity, and surprise.We used the German and English audio track spoken by the same balanced bilingual male actor with a mature, sonorant voice from Thoma and Baum (2019) and comparable speakers of French, Russian, and Turkish who read close translations of the narrative (see Supplement B for transcriptions).As the video was 14 years old and originally aired in Switzerland, it was unlikely to be known by participants.In qualitative pretests, representatives of all four bilingual subsamples evaluated the setting and portrayed interpersonal relationship as plausible and culturally appropriate.To have a control measure for individual differences in parasympathetic pupil reactions to light (Bradley et al., 2008), we estimated the mean perceived relative luminance per video second computed with the luminous efficiency function (Poynton, 2012, Eq. 24.1) integrating red, green, and blue tristimulus values of the individual still images.
Procedure.Participants were invited in an email newsletter and via personal contacts to a study about emotionality in international advertising.Upon arrival in the eye-tracking lab, they provided written informed consent and were then randomly assigned to the German or the matching Lx conditions.After assignment, all instructions and tasks were in the language of the condition or non-verbal.Participants were seated approximately 70 cm in front of a 24″-computer screen connected to a remote SMI RED 500 Hz eye-tracking system.Room lightning was artificial and constant at 590 lux.The experiment started with a 9-point eye-tracking calibration that was repeated until position accuracy was ⩽.5°degrees of visual angle.To obtain a mood baseline and allow practicing response by eye fixation, participants indicated how they felt before the experiment on a simple Visual Analogue Mood Scale (VAMS; see van Rijsbergen et al., 2012 for review).They saw a 10-cm line with "happy" and "sad" poles, numbers 0 to 5 under the line, and five emojis on top and fixated the preferred emoji for 2 seconds to respond.Next, participants saw the baseline video followed by a 1-second fixation screen and the emotional target video (see Figure 2 for the timeline).Directly afterward, they read the instruction "How do you feel about this video?" in the language of the video and reported their response by fixating on one of the icons on 5-point SAM (Self-Assessment Manikin) arousal and valence scales (Lang, 1980).Participants then switched to another computer to answer a survey on their background data, AoA, language use, and proficiency self-assessment before they were paid and debriefed.

Pupil data preparation.
A Python script pre-processed the pupillometry data according to conventional procedures (Kinner et al., 2017;Lemercier et al., 2014;Thoma & Baum, 2019).The script deleted recordings during blinks and computed a mean diameter from the left and right pupil size for each measurement.Pupil dilation was defined as the difference between each participant's pupil diameter during the target video and their mean pupil size observed during the baseline video, so that cognitive load of processing audio input in a particular language should be controlled.To balance for measurement noise, the script finally down-sampled the dilation values from 500 Hz to 1-second epochs.Data analysis was confined to a time of interest during the target video.It started 1 second after the narrator voice set in and stopped 3 seconds after the voice ended (see Figure 2) to include only epochs where language could have made a difference.

Results
Validation of sampling.Before testing the hypotheses, we validated if the four subsamples qualified as HL speakers and L2 speakers considering self-reported AoA, language use frequency in emotional contexts, and self-assessed language proficiency.Due to the large number of theoretically possible comparisons, we only report selected differences.
First, Table 2 shows that the Russian-German and Turkish-German speakers had acquired both of their languages in early childhood in family settings, and their AoA did not differ significantly between the languages, F(1, 70) = 0.37, p = .543,η² = .01,and the subsamples, F(1, 70) = 0.47, p = .494,η² = .01.The German-English and German-French participants had acquired German from birth in their families, while they had initially learned English or French as a foreign language in school.The AoA differences of about 9 years between their L1 and L2 were significant, F(1, 88) = 1046.40,p = < .001,η² = .92,while the AoA of English and French was similar, t(87) = −0.74,p = .459,d = −0.17.
Second, we estimated language use frequency in emotional contexts for each language from the responses to two-item ascending 7-point rating scales.For each language, participants rated its use in contexts with family and friends (i.e., presumably emotional contexts) as well as in work and educational contexts.From these ratings, we computed a ratio score indexing the use of Lx in each context relative to the use of German (ratio = Lx/(Lx + German)).A ratio of 1 indicates exclusive use of Lx in a context.Figure 1 plots the mean ratios for each sample of bilinguals showing, for example, that German was clearly the dominant language in work and education settings for all groups.Importantly, a repeated measures analysis of variance (ANOVA) confirmed that whereas the use of German or Lx differed significantly between family-friends and work-education contexts for all bilingual samples, F(1, 157) = 44.57,p < .001,η² = .22,the pattern was reverse for HL and L2 speakers, F(1, 157) = 89.76,p < .001,η² = .63.In fact, both HL samples used Lx more often in emotional family-friends contexts than for work and education (compare the first and second bars, respectively, in Figure 1), Russian: t(39) = 13.66,p < .001,d = 0.14; Turkish: t(31) = 7.15, p < .001,d = 0.14.Both L2 samples, in contrast, used Lx less frequently with family and friends than for work and education, English: t(57) = −5.98,p < .001,d = 0.15; French: t(30) = −3.43,p = .002,d = 0.11.Third, participants self-assessed their speaking, reading, writing, listening, and grammar skills on 7-point scales in both of their languages.The ratings were first transformed into a proportionate mean proficiency score ranging from 0 (lowest) to 1 (highest) and then into an Lx self-assessment ratio score relative to German in analogy to the use ratio reported above.If the ratio is 0.5, this indicates balanced proficiency in Lx and German.As the striped bars in Figure 1 suggest, HL speakers' German and Lx proficiency were similar, whereas L2 English and French skills were lower, compared to L1 German.A repeated measures ANOVA comparing Lx versus German selfassessments confirmed a main effect of language, F(1, 157) = 104.74,p < .001,η² = .40,and an interaction with sample, F(3, 157) = 17.34, p < .001,η² = .25.In paired comparisons, Russian, t(39) = 1.20, p = .237,d = 0.19, and Turkish, t(31) = 1.79, p = .082,d = 0.32, speakers rated their Lx proficiency similar to German.In contrast, the German-English, t(51) = 13.48,p < .001,d = 1.77, and German-French, t(51) = 9.22, p < .001,d = 1.67, sample evaluated their proficiency in Lx substantially lower than in German.
Based on the AoA, language use, and proficiency data, we concluded that the Russian-German and Turkish-German HL subsamples were balanced simultaneous bilinguals.As expected, the German-English and German-French samples were sequential unbalanced L2 speakers.
Pupil dilation.As the line plot in Figure 2 illustrates, pupil dilation tends to evolve over time, and it is generally highly susceptible to individual differences (Lemercier et al., 2014).Therefore, we analyzed the pupil data with linear mixed-effects regression models using the lmer function from the lme4 package (Bates et al., 2021) and lmerTest (Kuznetsova et al., 2017) to estimate p values in RStudio (RStudio Team, 2021).The categorical fixed factors were deviation-coded with bilingual status (HL = −.5 vs L2 = .5)and language of the video (LoV; German = −.5 vs Lx = .5).The continuous co-variates mood baseline, epoch, and luminance were centered to achieve comparable regression estimates.Pre-experimental mood was similar in the four comparison groups (see Table 4).Interactions beyond the two categorical predictors did not improve model fit (backward fitting via log-likelihood comparisons; Barr et al., 2013).All models included by-participant random intercepts and slopes for bilingual status and LoV, and a random intercept for epoch.
The model predicting pupil dilation in Table 3 revealed only a significant main effect of epoch.Pupil dilation showed the typical, decreasing pattern after an initial peak when the narration set in across all conditions (see Figure 2).Notably, bilingual status and LoV interacted significantly as visualized in Panel (a) of Figure 3. Separate models fitted to the subsample data split by bilingual status showed that while heritage speakers' pupil responses were very similar in German and Lx, b = 0.020, SE = 0.063, t = 0.32, p = .751,L2 speakers showed significantly stronger pupil dilation in German than in Lx, b = −0.174,SE = 0.058, t = −3.02,p = .003.
Self-reported arousal and valence.Table 4 summarizes the descriptive values for SAM arousal and valence reported by the four comparison groups.Note that the valence scale was bi-polar (−2 = sad, 2 = happy) and therefore recoded, so that larger numbers symbolize stronger negative or positive valence.We used ordinal logistic regression models to predict arousal and valence ratings (polr function from MASS package, version 7.3-54; Venables & Ripley, 2007).Predictors were deviationcoded bilingualism status, LoV, and their interaction as well as the centered mood score.
The model predicting arousal (Table 5) found a significant main effect of bilingual status because the L2 group stated overall higher arousal than the HL group.The main effect of LoV was also significant with higher ratings in Lx than German.These main effects were qualified by a significant interaction (see Panel (b) of Figure 3), such that HL speakers experienced languagedependent arousal differences, b = 1.735,SE = 0.475, t = 3.65, p < .001,whereas L2 speakers did not, b = 0.049, SE = 0.435, t = .11,p = .910.In the valence model (Table 5), LoV showed a significant main effect with Lx eliciting stronger positive and negative feelings.The effect of bilingual status and the two-way interaction did not reach significance even though the valence advantage of Lx relative to German was significant in the HL group, b = 1.465,SE = 0.471, t = 3.11, p = .001,but weak in the L2 group, b = 0.751, SE = 0.444, t = 1.69, p = .091.

Discussion
We examined if balanced simultaneous bilinguals and unbalanced sequential bilinguals respond differently to an emotional narrative in their languages in automatic and self-conscious measures of emotion.Based on self-reported AoA, language use, and proficiency, we defined two samples of Russian and Turkish HL speakers as balanced simultaneous bilinguals because they had acquired German and the Lx in early childhood, used both languages comparably often in presumably emotional contexts with family and friends, and had similar overall proficiencies.Two samples of German-English and German-French speakers qualified as unbalanced sequential bilinguals because they had acquired Lx English or French as a foreign language substantially later than their L1 German, reported to use Lx predominantly in educational and work contexts, and rated their Lx proficiency lower.
The two types of bilinguals showed a reversed pattern of language-dependent emotional reactions measured in terms of automatic pupillary responsiveness, compared to self-reported arousal and valence ratings.Heritage speakers' pupil dilation was similar in both languages, while they rated the German narrative of the same story less emotional than the Lx one.The main and interaction effects of bilingual status were significant for arousal and indicated a similar trend for valence ratings.Since language-dependent emotion differences seem to increase with task emotionality (Thoma, 2021) and the story in the video is predominantly heart-warming and moving rather than pleasant or unpleasant, this discrepancy between arousal and valence could be a task effect.(Lang, 1980); SD: standard deviation; HL: heritage language speakers; L2: second language speakers.Importantly, L2 speakers showed weaker automatic reactions in Lx than German, yet they rated the narrative similarly emotional in both languages.
We interpret these results in the light of two hypotheses.First, prior work designed against the background of context and processing theories of bilingual emotions has rarely sampled balanced simultaneous bilinguals because they should not show any interesting differences.The present results confirmed this with the data obtained from the automatic measure.Also consistent with prior studies (e.g., Iacozza et al., 2017;Thoma & Baum, 2019), our unbalanced sequential bilinguals' pupil reactions stronger in German than in Lx suggesting stronger automatic emotional reactions in German.Our experimental paradigm integrated both populations.It revealed a significant interaction between the listeners' bilingual status and the language of the emotional narrative.This confirmed our H1, in that balanced simultaneous (HL) bilinguals' automatic pupillary reactions were less sensitive in response to a German versus Lx emotional stimulus, compared to unbalanced sequential (L2) bilinguals.
In parallel to most multi-measure studies reviewed in Table 1, the language-induced difference at the automatic level did not permeate to self-reported arousal and valence.This is difficult to explain for context theory because the later AoA and less emotional use contexts of Lx should lead   (Lang, 1980).
to persistently weaker emotions in Lx, that is, L2.Processing theory, in contrast, can accommodate this disparity.Here, less efficient L2 language processing interferes with early automatic emotion processing, which leads to disparate emotional reactivity in L1 and L2.Self-reported feelings, however, also represent the outcome of later, more deliberate stages of emotion processing, where L2 processing catches up (Opitz & Degner, 2012) and different emotion regulation mechanisms are active (Thoma, 2021), so that the initial language-dependent disparity cannot always be observed.
To further validate these findings, future work could include unbalanced simultaneous and balanced sequential bilinguals.Second, context and processing theory would not have predicted that balanced simultaneous bilinguals rate the narrative less (or more) emotionally intense in their majority language German than in their HL Russian or Turkish.Within a frame-switching account, an HL can trigger sociocultural expectations, so that an HL narrative feels more emotional (Noriega & Blair, 2008;Zhou et al., 2021).Several multi-measure studies (Table 1) did not find a main effect of language at the level of reported feelings.Therefore, we hypothesized (H2) that balanced simultaneous bilinguals report larger differences between feelings induced by German versus HL than unbalanced sequential bilinguals between German versus L2.A significant interaction in the arousal model and a similar trend for valence confirmed H2.HL speakers rated the narrative more emotional in their HL presumably because it triggered the frame of a native language or mother tongue (Dewaele et al., 2021).The HL frame created a more appropriate "emotional fit" (Zhou et al., 2021) with the situation presented in the emotional narrative video than German although experience and learning have trained our balanced simultaneous bilinguals to process emotional stimuli equally well in both languages.Our results do not warrant claims about frames that are specific to the Russian or Turkish heritage culture, but the parallel effects in both HL groups suggest that the sociocultural function of HL in general (He, 2010) influences self-reported feelings.Further research could substantiate this interpretation by eliciting and evaluating language-dependent frames.
In fact, positive attitudes toward L2 English and French (e.g., Edwards & Fuchs, 2018) and expectations about the language choice in advertisements (Nederstigt & Hilberink-Schulpen, 2018) could also explain why the reduced automatic emotional reactivity in L2 did not surface in self-conscious feelings because positive attitudes offset the disparity at the automatic level.However, there are at least five other explanations for language-independent feelings in the L2 groups.Initial differences (1) may disappear after long enough processing (Opitz & Degner, 2012), or (2) they could be offset by emotion regulation (Thoma, 2021).The L2 (3) may also trigger different processing modes (de Langhe et al., 2011) or (4) decision modes (Keysar et al., 2012), and (5) reduced familiarity of L2 could stipulate a novelty effect that made the L2 narrative more attractive (Ayçiçeği-Dinn & Caldwell-Harris, 2009).The trend toward higher perceived valence in L2 may indicate interest and attitude effects, but our research is limited in that we did not collect the data to test those explanations, for example, on language attitudes, processing mode, or interest in the narrative video.While this could be addressed in future research, the limitation that we did not use the same language pairs in the HL and L2 groups is more difficult to overcome because Turkish is not taught as a foreign language in regular German schools and French heritage speakers are rare in Germany, for example.It may be promising, however, to work with multiple emotion stimuli that also vary socioculturally to disentangle effects of language and culture.
In combination, the current findings support that automatic neurophysiological and conscious meta-cognitive measures of emotion tap into different stages or dimensions of emotion processing or, in other words, language-induced differences between intuitive emotional reactions and feelings.Theories of bilingual emotions need further development to explain these incongruent emotional reactions consistently and explicitly.Methodologically, the findings advocate for multi-measure approaches to enhance the validity of future research on bilingual emotions.

Figure 1 .
Figure 1.Means and confidence intervals for the language use frequency and self-assessment ratio scores across the four subsamples and their languages.A ratio of 0.50 indicates balance between German and Lx.

Figure 2 .
Figure 2. Pupil diameter change as a function of time (epoch), bilingual status, and language of the video.

Figure 3 .
Figure3.Co-variate adjusted interaction plots for bilingual status × language of the video with 95% CIs for (a) pupil dilation and (b) self-rated arousal.Labels: Heritage: group of Russian and Turkish heritage speakers; L2: group of English and French second language speakers; Lx: language other than German (Heritage or L2); GER: German.

Table 1 .
Systematic review of multi-measure studies on bilinguals' language-dependent emotions.

Table 2 .
Subsamples of bilinguals and their AoA of German and their other language (Lx).

Table 4 .
Means and standard deviations for mood, arousal, and valence.

Table 3 .
Linear mixed-effects regression model predicting pupil dilation.

Table 5 .
Linear regression models predicting self-reported arousal and valence.