Differences in acute stress responses depending on first or second language in a Hispanic-American sample

Abstract Using a second language is a daily experience for many people today, among them many migrants. To determine whether speaking a second language induces a stronger cortisol or alpha-amylase (sAA) response than first language, we tested a Hispanic-American sample in two Trier Social Stress Test (TSST) conditions: First (Spanish) and second (German) language. Thirty-two participants (64.5% female) between the age of 19 and 53 years (mean = 30.68) from Latin America were tested (15 in Spanish, 17 in German). Participants were randomized to a German or Spanish version of the TSST, gave six saliva samples and completed questionnaires on perceived threat and stress, positive and negative affect as well as state-anxiety. A significantly higher stress response was found in the German condition for salivary cortisol, but not for sAA. Self-report showed significantly higher perceived threat and negative affect after the TSST for the German compared to the first language condition. Speaking a second compared to first language in a challenging situation appeared to be more stressful and threatening for participants. Further, reported increases in state-anxiety appeared to be higher in the German condition, even though group differences did not reach significance. A more detailed investigation of underlying, stress inducing mechanisms should be considered in future studies as well as associations with language proficiency and improvements over time.


Introduction
Migration movements have globally increased within the last decades and reached the highest number of international migrants in 2020 (International Organization for Migration, 2021). Therefore, consequences of post-migration stressors, such as experiences of loss, isolation, uncertainties concerning the residence status as well as experiences of stigma and discrimination are relevant when considering migrants' health status (Demiralay & Haasen, 2011;Erim & Morawa, 2016;Esses, 2021;Flores et al., 2008). To understand processes that are initiated by the arrival in a foreign country, Berry et al. (1987) introduced the concept of acculturation and described it as psychological changes due to contact between two cultural groups. Stressors arising from this are referred to as acculturative stress. Research showed positive associations of acculturative stress with physiological stress markers, blunted cortisol awakening responses as well as declines in mental health, feelings of marginality, alienation and identity confusion (Berry et al., 1987;Garcia et al., 2017;Mangold et al., 2010;Scholaske et al., 2021). These health outcomes might be mediated by chronic stress experiences (Finch & Vega, 2003), which are associated with long-term changes of stress hormones and inflammation markers (Juster et al., 2010;Marsland et al., 2017;Rohleder, 2019) and may lead to dysfunctions of the stress systems (McEwen, 1998). Belonging to an (ethnic) minority influences the type and frequency of stressful experiences as well as available resources and coping strategies, what is also called Minority Status Stress (Flores et al., 2008;Slavin et al., 1991). It follows from this, that stressors experienced in the past might enhance vulnerability to the effects of future stressors through underlying mechanisms such as vigilance and higher sensitivity (Flores et al., 2008;Slavin et al., 1991). This is underscored by results of studies that found differences in hair cortisol or heart rate variability (HRV) between immigrant or minority and native samples, e.g. for African Americans (Hill & Thayer, 2019), for Turkish immigrants and asylum seekers in Germany (Fischer et al., 2017;Mewes et al., 2017).
Considering linguistic challenges in the framework of acculturative stress, being obliged to constantly speak a new language represents a recurring, daily stressor that affects social participation in the foreign country and reduces the interactional scope of a person (Demiralay & Haasen, 2011;Halim et al., 2017;Itzhak et al., 2017). Less language proficiency, high language conflict or language-based discrimination was found to be associated with higher levels of acculturative stress, general and somatic distress, lower income as well as lower self-rated health (Brown et al., 2010;Finch et al., 2004;Halim et al., 2017;Lueck & Wilson, 2010).
The stress-inducing mechanism of speaking a second language can be explained by Foreign Language Anxiety (FLA), a phenomenon that appears in language classes and describes the fear of performance evaluation in an academic or social context (Horwitz et al., 1986). Social-evaluative situations, as such language classes may be classified, induce an acute stress response in form of increasing cortisol and proinflammatory cytokines . Especially in younger bilingual speakers, foreign language anxiety was associated with higher cortisol reactivity (Legatzke & Gettler, 2021). Fischer et al. (2019) examined second language speaking in Swiss men who spoke Swiss versus standard German in an experimental stress-inducing scenario, the Trier Social Stress Test (TSST). Speaking standard German induced a significantly stronger cortisol response compared to Swiss German. The second language seemed to be an incremental stressor to the already stressful social-evaluative situation.
The aim of our study was to expand the results by Fischer et al. (2019) and examine an immigrant, Hispanic-American sample living in Germany with Spanish as first (L1) and German as second language (L2). We hypothesized higher biological stress response curves (condition by time interaction) for salivary cortisol and alpha-Amylase (sAA) in the L2 condition. Further, stress experienced in L2 is expected to induce higher stress and threat perceptions compared to L1 condition, measured with the Primary appraisal and secondary appraisal (PASA) questionnaire. Third, we expect a stronger increase in negative affect and state-anxiety, obtained with the Positive and Negative Affect Schedule (PANAS) and the State-Trait-Anxiety-Depression-Inventory (STADI), during the TSST for the L2 condition.

Participants
Participants were recruited from the Friedrich-Alexander-Universit€ at Erlangen-N€ urnberg campus via print and multimedia advertising as well as via local associations with reference to Middle or South America. Each participant received a monetary compensation of 20 e. Before getting an invitation to the laboratory testing, participants' eligibility was assessed via an online screening-questionnaire using an online survey tool ("Unipark," Questback, Germany). Inclusion criteria were: (1) minimum age of 18 years, (2) Spanish as native language and origin from Middle and South America (further referred to as "Hispanic Americans"), (3) minimum B1 level (intermediate) German language proficiency according to the Common European Framework of Reference for Languages (CEFR), (4) 10 cigarettes per week as maximum amount of smoking, (5) no physical or mental disorders, (6) no drug intake (e.g. glucocorticoids, anti-depressants) with the exception of hormonal contraceptives in women, (7) no previous experiences with the stress protocol and (8) no self-reported depressive symptoms. To measure depressive symptoms we used the Spanish version of the Center for Epidemiological Studies Depression Scale (CES-D; Roberts et al., 1989), originally by Radloff (1977). Participants who reached cutoff scores were excluded from the study, since this could be an indicator for depressive symptomatology, which may influence the cortisol response (Fiksdal et al., 2019).
A power analysis with G � Power (3.1.9.7) was conducted to determine the necessary sample size (Faul et al., 2009). We followed the effect sizes that Fischer et al. (2019) found in their study between the two conditions. This calculation resulted in N ¼ 54 when power was set at .80, the alpha error set at 0.05, and medium correlations between the repeated measures were indicated. As we expected participant recruitment to be difficult due to the health and language requirements and a rather low number of reachable and eligible participants in our region, we set the sample size to 20 participants per condition. This was decided as we made good experiences with rather small sample sizes in previous studies (Breines et al., 2015;Gianferante et al., 2014;Nater et al., 2005;Rohleder et al., 2003). Due to dropouts, our final sample consisted of N ¼ 32 healthy adults (62.5% female) with an average age of 30.38 years (SD ¼ 7.6, ranging from 19 to 53 years). Most participants were employed (n ¼ 14) or currently studying (n ¼ 11), others were unemployed, homemakers, or apprentices. General educational attainment was rather high, as 13 participants reported a completed Bachelor's degree, 14 participants a completed Master's degree. Regarding the duration of stay in Germany, the average duration constituted 55.3 months (approximately 4.6 years; SD ¼ 55.5), ranging from 4 to 204 months. The participants were born in Latin America, the countries of origin were Columbia (n ¼ 10), Mexico (n ¼ 7), Argentina (n ¼ 4), Peru (n ¼ 4). The most frequently mentioned reason for immigrating to Germany were university studies (n ¼ 19) or family reasons (n ¼ 5). The average length of time that participants had already been learning and speaking German was 8.8 years (SD ¼ 5.3), self-evaluated German language skills were primarily B2 (independent user; n ¼ 13) or C1 (proficient user; n ¼ 11) according to the CEFR. To ensure the language proficiency, a language test was conducted within the screening questionnaire, which used the concept of reduced redundancy, a C-Test (Klein-Braley, 1985). Average percentage of correct answers in the C-Test amounted to 76.8% (SD ¼ 13.6), which checked for minimum B1 levels.
The study was conducted in accordance with the Declaration of Helsinki. The Ethics Committee of the medical Faculty of the Friedrich-Alexander-Universit€ at Erlangen-N€ urnberg gave ethical approval for this study (ethical approval code: 451_19B). All participants gave written informed consent before testing.

Study design
Participants were randomized to participate either in the German or the Spanish TSST condition in a between-subjects design. All investigators were German and spoke German with participants independently of the experimental condition (except for TSSTs). All questionnaires and documents were presented in Spanish to guarantee full comprehension of the experimental procedure.

Procedure
Eligible participants were invited to the laboratory and scheduled between 1:00 pm and 8:00 pm to control for the circadian rhythm of the hypothalamus-pituitary-adrenal (HPA) axis. For the distribution of participants across times of day, there were no significant differences between the two conditions (p ¼ .55). After their arrival, participants were guided into a partitioned room and given verbal and written instructions about the experimental procedure. After informed consent was obtained, a resting phase followed during which participants completed demographic measures; then their weight and height were taken. Approximately 30 minutes after arrival, the first saliva sample was obtained (-1 min) using Salivette devices (blue-top cortisol salivette, Sarstedt, N€ umbrecht) and the TSST was introduced and completed in a standardized manner as described below. Following the TSST, additional saliva samples were collected (þ1, þ10, þ20, þ30, þ45 min). At the end of the laboratory visit, participants were debriefed and dismissed.

Procedural changes due to the Covid-19 pandemic
Since the study was interrupted by the Covid-19 pandemic, 21 participants were tested before and 11 participants during the pandemic. This resulted in minimal changes in the study protocol. Participants and investigators wore face masks, except during the TSST. Further, in the TSST room, Plexiglas plates separated participants and TSST committee members. The results for cortisol and sAA showed no significant differences for participants before and during the pandemic (p > .05).

Acute stress induction
The TSST was used for acute stress induction (Kirschbaum et al., 1993). After the first saliva sample, participants were asked for their dream job by the main experimenter and then guided into a separate room, where one male and one female TSST committee member were already waiting, wearing white lab coats. There, participants were told about the overall procedure that resembled a job interview: They were given five minutes preparation time during which they were asked to complete the PASA questionnaire (Gaab, 2009). Afterwards, participants were required to give a short speech in front of the panel about their dream job. They were told to emphasize why their personality qualifies them for the job. This was followed by a mental arithmetic task in which they were asked to serially subtract 17 from 2043. If they made a mistake, they were asked to start again. In general, the panel remained facially neutral, gave no feedback at all, and made fictitious notes. The whole procedure was videoand audiotaped for further analyses. In the Spanish version, the TSST was conducted by Spanish native speakers; in the German condition, the TSST was conducted by German research assistants.

Primary appraisal and secondary appraisal
The Primary appraisal and secondary appraisal (PASA) questionnaire, original German version by Gaab (2009), was translated by our research group using the standard procedure of translation and back-translation. It was completed by participants during the TSST. The questionnaire consists of 16 items (e.g. "This situation scares me") with 6-point responses from 1 ¼ completely wrong to 6 ¼ completely correct and of two subscales with two additional subscales each: Primary Appraisal (consisting of threat and challenge) and Secondary Appraisal (consisting of self-concept and control belief). A stress index can be calculated by subtraction of primary and secondary appraisal. The subscale threat showed an acceptable internal validity (a ¼ .77) as well as the subscale challenge (a ¼ .71). The subscale self-concept on the other hand showed a rather poor internal validity (a ¼ .66) as well as the subscale control belief (a ¼ .62). These two subscales were not used for analyses.

Positive and negative affect
The Spanish version of the Positive and Negative Affect Schedule (PANAS) was used (Robles & P� aez, 2003; originally by Watson et al., 1988). The scale has 20 Items, 10 items for positive affect (e.g. "excited") and 10 items for negative affect (e.g. "distressed") with 5-point responses from 1 ¼ very slightly or not at all to 5 ¼ extremely. Participants completed it before and after the stress test (-1 and þ1). A good internal validity for the questionnaire before the TSST (a ¼ .87) and after the TSST (a ¼ .84) were found.

Chronic stress
The Spanish version of the Perceived Stress Scale (PSS) with 10 items (PSS-10) by Remor (2006), originally by Cohen et al. (1983), was used to measure chronic stress within the last four weeks. Participants completed this questionnaire after the TSST during the waiting period due to the saliva measurements. The scale consists of 10 items (e.g. "In the last month, how often have you felt that you were unable to control the important things in your life?") with a 5-point response scale (0 ¼ never to 4 ¼ very often). A good internal validity was found (a ¼ .88).

State-anxiety
State-Anxiety was measured with the State-Trait-Anxiety-Depression-Inventory (STADI), the original German version by Laux et al. (2013). The inventory was translated by our research group using the standard procedure of translation and back-translation. Participants completed it before and after the stress test (-1 and þ1). The inventory consists of 20 items (e.g. "My heart beats quickly."), which refer to the current state or can be considered in general. The items are rated on a 4-point response scale from 1 (not at all) to 4 (very much). For this study, only the state inventory was assessed before and after the TSST, and only the anxiety subscale was used. Intern validity for the anxiety subscale was good before (a ¼ .84) and after the TSST (a ¼ .84).

Self-reported depressive symptoms
To measure depressive symptoms the Center for Epidemiologic Studies Depression Scale (CES-D), originally by Radloff (1977), was used in its Spanish version (Roberts et al., 1989) and asked during the study's online screening. The scale comprises 20 items (e.g. "I thought my life had been a failure.") that are rated on a 4-point response scale (0 ¼ rarely or none of the time/less than 1 day to 3 ¼ most or all of the time/5-7 days) and assess depressive symptoms within the last 7 days. After reverse coding four items, overall sum scores were computed. The scale's internal validity is to be rated rather poor (a ¼ .59).

Salivary cortisol and alpha-amylase
Saliva samples were stored at À 30 � C after collection for later analyses. Before cortisol and sAA measurement, two freezethaw cycles were performed. Immediately before measurement, samples were centrifuged at 2,000 � g and 20 � C for 10 min. sAA was measured with an in-house enzyme kinetic assay using reagents using reagents from DiaSys Diagnostic Systems GmbH (Holzheim, Germany), as previously described (Bosch et al., 2003;Rohleder & Nater, 2009). In brief, saliva was diluted at 1:625 with ultrapure water, and diluted saliva was incubated with substrate reagent (a-amylase CC FS; DiaSys Diagnostic Systems) at 37 � C for three minutes before a first absorbance reading was taken at 405 nm with a Tecan Infinite 200 PRO reader (Tecan, Crailsheim, Germany). A second reading was taken after 5 min incubation at 37 � C and increase in absorbance was transformed to sAA concentration (U/mL), using a standard curve prepared using "Calibrator f.a.s." solution (Roche Diagnostics, Mannheim, Germany). Salivary cortisol concentrations were determined in duplicate using chemiluminescence immunoassay (CLIA, IBL, Hamburg, Germany). Intra-and inter-assay coefficients of variation were below 10% for both sAA and cortisol.

Analytic strategy
All analyses were conducted with IBM SPSS Statistics 26 and 28 (Chicago, Illinois, USA). In the first step of data processing, potential missing values and outliers were checked. Four missing values for paper-and-pencil questionnaires (PANAS, PASA) were found. The Little test showed that these values were missing completely at random (p > .05); subscale mean values were calculated without the missing value. One participant did not complete the PASA during the TSST and hence was excluded listwise from further analyses. Further, we screened for outliers using z-scores. One participant had to be excluded from cortisol analyses due to elevated baseline cortisol level (z > 3.29) at À 1 min to the TSST; elevated values are indicative of anticipatory stress, and therefore likely unable to mount an HPA axis response as previously described (Roos et al., 2019). Aside from that, one participant showed an elevated BMI value (z > 3.29), which was controlled for.
To check for normal distribution, Shapiro Wilk tests were used. Cortisol and amylase values were log-transformed to reduce skewness. Following data analyses were conducted using transformed values. Analysis of variance (ANOVA) for repeated measures was used to assess stress-induced changes in cortisol and sAA over time with "group" (first vs. second language) as between-subject factor and "time" (six time points) as within-subject factor. Greenhouse-Geisser corrections were applied if the sphericity assumption was violated in all ANOVAs (Vasey & Thayer, 1987). The Levene test was used to test for variance homogeneity. For cortisol and sAA, the maximum increases were calculated since both conditions differed in when they reached cortisol and sAA peak. This approach takes the highest value for a participant out of two (for sAA) or three (for cortisol) stress response measures in sequence of the TSST. These are the times at which a peak can be expected as established and previously described (Roos et al., 2019;Thoma et al., 2017). The baseline measure shortly before the TSST is then subtracted from the highest value to obtain the increase for each participant. This allows taking the interindividual time varying peaks into account. t-Tests were used to determine differences between the two groups' maximum increases. For the PANAS and the STADI delta values were calculated. t-Tests or Mann-Whitney-U-tests between both groups were computed to analyze for group differences in the maximum increases and changes in positive and negative affect.
We considered sex, BMI and chronic stress as covariates in the hypothesis testing of differences in sAA and cortisol between groups, as previous research revealed their effect on the aforementioned physiological parameters (Kirschbaum et al., 1999;Kudielka & Kirschbaum, 2005;McInnis et al., 2014;Miller et al., 2007). The other control variables (depressive symptoms, smoking, age and intake of hormonal contraceptives as well as other language specific variables) were tested for group differences between the two conditions (see Table 1) and for significant correlations with the outcome variables. If no group difference or correlation was found, they were not considered in further analyses due to small subsamples. Eight women reported to take hormonal contraceptives, one reported a hysterectomy. Two participants reported smoking, but less than 10 cigarettes per week, one in the first and one in the second language condition. Overall, results were considered significant at p < .05.

Preliminary analyses
Descriptive statistics and group differences in control variables are shown in Table 1. Control variables did not differ significantly between conditions except age. Therefore, age will be reconsidered as covariate in the following repeated measures ANOVAs. For the specific time of day participants visited the laboratory, no differences were observed for baseline cortisol and sAA levels or the maximum increases following the TSST. Zero-order correlations showed no significant association between "duration of stay in Germany," "age of language acquisition," "self-reported language level," "years spend studying German," language proficiency with the dependent variables such as physiologic stress response, perceived threat or affect as well as between cigarette smoking and hormonal contraceptive intake. Thus, they will not be considered in further analyses. Other zero-order correlations are shown in Table 2.
The repeated measures ANOVA for the sAA response was not significant, no differences between conditions were found (group by time interaction: F(5,150) ¼ 0.79, p ¼ .56, g p 2 ¼ .03; see Figure 1B). Group differences also remained not significant using BMI, age, sex and chronic stress as covariates, (BMI: F(5,145)

Associations with perceived threat, affect, state-anxiety
As Table 3 and Figure 2 show, participants who took part in the German TSST experienced significantly more perceived threat and stress, measured by the PASA. This happened, even before the TSST had actually started but information about language were already announced to the participant. Further, negative affect showed a stronger increase in the L2 compared to L1 condition after the TSST, whereas positive affect showed a decrease as a result of the stressor. This decrease was stronger in the L2 condition, even though this difference was not significant. Moreover, participants in the second language condition showed a higher increase in state-anxiety compared to the first language condition, although group differences were not significant as well.

Discussion
It is widely accepted that migrants face various stressors. Still, the influence of having to speak a second language in socialevaluative, acute stress situations is rather unexplored. We therefore investigated whether stress responses of Hispanic-Americans in an experimental setting were stronger if tested in their first language Spanish or a second language, German in this case. Results showed that all participants, independently of experimental condition, experienced a stress response following the TSST, for cortisol as well as for sAA. As expected, cortisol responses differed between conditions, with higher responses in the second compared to first language condition. For sAA, no significant differences between L1 and L2 TSSTs were found. For both sAA and cortisol, the stress response peak appeared later in the L2 compared to the L1 condition. This could be a sign for stronger aftereffects following acute stress in the second language and will Note. a Mann-Whitney-U test, b t-Test for independent samples, c v 2 -test; L1 ¼ first language; L2 ¼ second language; Language Level ranged from 1 ¼ basic knowledge to 5 ¼ approximate native language proficiency according to the CEFR; BMI: Body Mass Index; kg/m 2 : kilograms per meters squared; sAA: salivary Alpha-Amylase; PSS: Perceived Stress Scale; CES-D: Center for Epidemiologic Studies -Depression Scale. be discussed below. Considering self-reported stress and threat, results showed higher ratings in the second compared to first language condition. Further, participants experienced increasing negative affect and, though not significantly, stateanxiety during the L2 TSST.
Results confirm the assumption that speaking a second language induces a significantly stronger cortisol response than speaking a first language in the stress test situation. These results replicate the findings by Fischer et al. (2019) and add important insights, as Fischer and colleagues examined Swiss men only: On the one hand, our present study found stronger cortisol responses also for women; on the other hand, this study analyzed a complete immigrant sample that learned the second language in their late adolescence. Further, Fischer et al. (2019) compared Standard and Swiss German, which can be seen as varieties of the same language, whereas Spanish and German are clearly perceived to be two different and unrelated languages. Besides, it is     imaginable that second language speakers feel more exposed as such in Germany than Standard German Swiss speakers feel in Switzerland, as most Swiss persons presumably experience a comparable Standard German learning process during their childhood or school career. Further, mechanisms similar to the FLA as well as the social-evaluative and uncontrollable character of the situation may have resulted in these found cortisol increases Horwitz et al., 1986). In the context of this, participants of the L2 condition may have faced greater acculturation pressure, which also led to HPA axis activation for rather younger speakers in a recent study (Legatzke & Gettler, 2021). Following from this aspect, previously made acculturative stress experiences as well as migratory stressors may have intensified the stress response induced by the German TSST by activating stressful experiences or triggering threat (Flores et al., 2008;Lueck & Wilson, 2010). In contrast to cortisol, no differences between conditions were found for sAA. Fischer et al. (2019) did not find a significant difference for heart rate either, another autonomic marker. Whereas cortisol responds especially to psychosocial and social-evaluative stressors, sAA and heart rate respond to different stressors equally strong, especially to physical tasks (Skoluda et al., 2015). Fischer et al. (2019) argued that autonomic markers could have been "less sensitive to the incremental effect of second language use on the stress response" (p. 10). These effects are conceivable for the present study as well. Further, as Skoluda et al. (2015) described, sAA is affected quickly by environmental influences and therefore measurement errors might have occurred and deformed clear group differences. This may have been the case in the present study as groups already differed significantly before entering the stress test.
Nevertheless, for both cortisol and sAA, the German TSST led to delayed peaks in the stress responses compared to stress curves of the first language condition, which might be interpreted as an earlier stress recovery in the L1. Besides mentioned acculturative stressors, further variables may be responsible for the lasting impact of the L2 condition on the stress responses. As previous research showed, past experiences influence the extent of physiological stress responses. Flores et al. (2008) described, that these experiences might contribute to the perception and interpretation of future situations as equally stressful and threatening. Past discriminatory experiences as well as experiencing language-based stigma of the L2 is known to lead to pronounced cortisol responses to the TSST as well as to perceived threat and reduced performances to experimental tasks (Birney et al., 2020;Busse et al., 2017;Thames et al., 2013). These higher cortisol responses while speaking the L2 could therefore be interpreted as adaptive functioning of the stress systems that were mobilizing resources to cope with the stressor (McEwen, 2019). The present study also found higher levels of perceived threat and increasing negative affect in the L2 condition. This might suggest that participants were aware of the possibility to experience language-based stigma while facing German TSST committee members. Anyway, higher perceived threat and stress is known to be correlated with higher cortisol increases following the TSST . In addition, participants were not able to choose their preferred language for the job interview and arithmetic task and may have experienced a situation of uncontrollability (Kemeny, 2003). This could have reinforced the already uncontrollable and threatening TSST scenario.
Even though the current study examined acute stress situations instead of long-term consequences of L1 or L2 language use, it is conceivable that results underscore previous findings: In the social evaluative situation simulated in our study, this may not primarily concern one's ethnic identity, but also the perception of oneself as competent or eligible for the dream job (even though of the interview's fictive nature). A job interview seems to be a situation where the use of the "legitimate language" (Bourdieu, 1977) is highly expected. Participants may feel pressure to adopt their speech to the "academic" or "institutional" norm of standard German, in the sense of the most prestigious norm whose "correct" form is codified in grammars and dictionaries (Sinner, 2020). For obvious reasons, this is a more complex task for German L2 speakers than for German L1 speakers and in our study not influenced by participants' German language proficiency. Presumably, the TSST consists of stressful aspects, which are not mitigated by advanced language skills. Thus, the participants may not only have had conflict to produce the second language but also to be accepted and respected for their speech, which in a formal situation in Germany is likely to be evaluated against the proficiency of German L1 speakers. This is underlined by the fact that our sample showed higher L2 cortisol responses even though they had been learning German for more than eight years on average. Such long period seems insufficient to eliminate the additional stressful nature of speaking a second language in an already stressful situation. However, our results could also give indications that being allowed to speak L1 might have protective effects on participants' health during challenging situations. Still, these potential, underlying mechanisms need further research.
This study has several strengths and limitations. First, the examination of an immigrant sample made it possible to expand conclusions of previous findings about incremental stress by second language use. Second, despite the small sample size, strong effects for cortisol responses were found as well as significant differences in self-report measures such as perceived threat and negative affect. This shows that participants were aware of the stressful and challenging experience of second language use, even before the actual stress situation happened.
Still, in further research language acculturation or experienced language-based stigma should be measured to control for previous, potential negative experiences with the second language. Moderation or mediation analyses could give new insights about these connections and hold important additional information about factors that made the second language condition more threatening and stressful. Subsequent studies should concentrate on bigger and more representative sample sizes. This population was difficult to recruit, maybe due to our strict inclusion and exclusion criteria as well as rather few reachable persons in our region. Moreover, our sample was homogenous (high education levels, high L2 language proficiency) and simultaneously heterogonous considering other variables (pre-and peri-migratory stress experiences, different countries of origin with different political situations). When examining bigger samples, more control variables could be integrated into statistical analyses, such as ovarian cycle phases, country of origin, etc. Further, in our study L1 and L2 were used independently of experimental condition but differently depending on language: German investigators spoke German while questionnaires were presented in Spanish. In future studies, language needs to be controlled in the course of the study protocol and L1 or L2 should be chosen wisely to minimize priming effects.
As Fischer et al. (2019) already stated, the requirement of speaking a second language could serve as a constant minor stressor. Still, as this study's participants have already lived in Germany for about four years on average, the question is raised of how recent immigrants with minimum level of German language proficiency may respond physiologically to the need of L2 use. In this context, it could be also of interest how this stressor develops over time as immigrated persons might acculturate, improve in L2 language proficiency, and become more confident in their linguistic skills. Especially taking stronger after-effects of L2 speaking into account, potential failures of the stress system, whether to turn off the stress response or to respond repeatedly and consistently intense, could be of interest, as they might be connected to adverse health outcomes (McEwen, 2019). Further, this could lead to the practical implications that developments of specific programs should be encouraged, which may help to mitigate such stress over time for affected persons.

Conclusions
In summary, this study shows that being compelled to speak a second language in a laboratory setting that simulates a social evaluative situation leads to experiences of stress, increased negative affect and perceived threat. Second language speaking also has the power to "add" stress to an already stressful and threatening experience. As the TSST somehow is a "real-world-situation," it is imaginable that simply speaking a second language in challenging situations but also language-related difficulties account for a least a considerable part of acculturative stress. Therefore, future research should deepen the knowledge about cortisol and sAA responses to second language speaking, especially in nonstressful situations as well as in longitudinal studies, and examine how the first language may serve as protective factor against adverse health outcomes.