Introduction

Various subdisciplines of linguistics have long been concerned with the different mechanisms and principles of how languages influence each other. Here, we want to focus on lexicology, where the primary interest lays on the vocabulary, i.e., the collective lexicon, of a language. The phenomenon of lexical borrowings (or loanwords) has been discussed extensively (see Hoffer, 2002 for an overview). More specifically, processes of integration, i.e., adapting foreign lexical items to the system of the recipient language on the morpho-syntactic (Onysko, 2007) and semantic level (Field, 2002) are of special interest (for a more extensive overview, see Haspelmath, 2009). Here, a loanword is defined “as a word that at some point in the history of a language entered its lexicon as a result of borrowing (or transfer, or copying)” (Haspelmath, 2009, 36). So, the term borrowing “refers to a completed language change, a diachronic process that once started as an individual innovation but has been propagated throughout the speech community” (Haspelmath, 2009, 38). In contrast, “native” words are lexical items for which we have to assume, based on our knowledge of a language’s history, that they were not borrowed from another language.

Languages, or speech communities, allow innovation and propagation to different degrees so that a scale of receptivity “gives an indication of the level of acceptance or resistance to the imported loanwords”, more specifically, “an indication of the amount of language borrowing across time” and “an indication of the official resistance to the importation of loanwords” (Hoffer, 2005, 61). Resistance against loanwords (e.g., because they are felt to be a luxury and not a necessity, cf. Onysko and Winter-Froemel, 2011) is a sociological phenomenon, therefore bringing the subfield of sociolinguistics into the picture. The assumption here is that “a sociological process of acceptation” (Poplack and Sankoff, 1984, 101) is an integral part of the integration of loanwords into a recipient language. Chesley and Baayen (2010, 1344) discuss, among several quantitative criteria like dispersion and word length, the relevance of the cultural context in which a neologism is used. Poplack (2017) also lists linguistic and sociological criteria to assess the point when a borrowed word has been fully integrated into the lexicon of the recipient language. Finally, the inclusion of (new) loanwords into dictionaries of the recipient language is regarded by researchers as well as speakers as an indication that they are generally accepted as language norm (cf. Bańko and Hebal-Jezierska, 2014, Zgusta, 1971, 183–184).

Of course, not all words (either loanwords or “native” words) that emerge in a specific language at a specific time later diffuse into everyday language use; some are never lexicalized but remain nonce words. Neologisms, on the other hand, are lexical units (either newly borrowed or new compounds or derivatives) or meanings (for existing lexical units) which emerge in a speech community at a specific time and then diffuse. Although they are for some time still perceived as “new” by a majority of speakers, they are generally accepted as language norms (cf. Herberg et al. 2004, XII). While “(1) a nonce instantiation of a foreign word can be equated with the first stage of lexical innovation, (2) more frequently occurring foreign words represent a later stage in this development” (Poplack and Dion, 2012, 285).

When it comes to neologisms, we attribute to speakers the ability to perceive words as new, i.e., we think speakers have “neological intuition”, meaning that they have the “metalinguistic ability to evaluate lexical novelty” (Lombard et al. 2021, 1). In their study, which can be attributed to another subfield of linguistics, namely psycholinguistics, Lombard et al. demonstrate for French that neological intuition “depends on the neologism type, in particular, that it is stronger for morphological neologisms [new lexemes formed within a language] than for semantic ones [new meanings for existing lexemes], and stronger for irregular neologisms [formed with irregular lexical formation processes] than for regular ones” (2021, 13). In our study, we presume that recognizing a lexical item as new is also an indicator of accepting this item as part of the German language. We also presume that neologisms that are compounds or derivatives are accepted more easily by individual members of a speech community because they fit into existing semantic and morpho-semantic lexical patterns.

More generally, we are interested in the question of whether borrowed neologisms are accepted more slowly into the German language than German words resulting from the application of word formation rules, namely compounds. We build on prior evidence from a corpus-linguistic study (cf. Section “Prior work”). There, we found that neologisms consisting of foreign language material take longer than “native” neologisms to be perceived as an integral part of German—at least if we take the extent of linguistic marking as a proxy for acceptance and integration. In the present study, we outline a psycholinguistic approach to evaluate the (psychological) status of different neologisms and non-words in an experimentally controlled study. We aim at insights into the potential insecurities of participants regarding the status of German neologisms that are either formed as compounds in German or have been borrowed from English.

In the following section, we discuss prior work, and we derive our hypotheses in Section “Hypotheses”. Section “Method” explains our method (participants, stimuli and design, procedure), Section “Results” presents the results, preceded by remarks on data processing and selection and model fitting. We discuss the results in Section “Discussion”, and we conclude with some general considerations in Section “Conclusion and implications”.

Prior work

Hoffer (2002, 4; referring to Sapir 1921) points out that “the way a language reacts to foreign words, by accepting, translating, or rejecting them, may shed light on its innate formal tendencies as well as on the psychological reaction of the speakers who use it”. From a prescriptive point of view, borrowings are presumably not accepted as easily and/or quickly as new words formed according to the word formation rules in a specific language, so the latter should be given priority. In such puristic views, any borrowed lexical item in the recipient language needs to be “purified” (Hoffer, 2002, 5; also see Haugen’s (1950) early analysis of linguistic borrowing in this regard). For (sometimes openly prescriptive) discussions related to specific languages, see, for example, Klajn (2001) regarding Serbian, Kozlovska et al. (2020) regarding Ukrainian terminology, Mockiené and Rackevičiené (2016) regarding Lithuanian terminology, Raadik and Tuulik (2018) regarding Estonian as well as Xo’janova and Jomg’irovna (2023) regarding Usbek.

From a descriptive point of view, borrowings are seen as a part of the recipient language’s lexicon in their own right that possibly enriches lexical choices (see, for example, Baklanova (2004) regarding Tagalog and Olko (2015) regarding colonial Nahuatl). They may even be used intentionally (e.g., for reasons of prestige) by certain speakers. Grant-Russell and Beaudet (1999) describe this for French words in written English in Quebec, Hoffer (2002) for functions of English loanwords in Japanese. Hurtado Miret (2021) reports similar findings for English borrowings in Catalan social media texts, Balakina and Visilitskaya (2015) for English borrowings in the Russian business language. Linder and De Sterck (2016) show an “ambivalent attitude” of scientists based in Spain who use new English terminology in research articles but try to find “naturalized” Spanish terms for their Spanish texts. Finally, in a diachronic view, borrowings can be seen as reflexes of cross-cultural transfer (cf. Olko 2015 on borrowings from Spanish in colonial Nahuatl).

Against the background of these different assessments of the status of borrowings and the general intuition that they are accepted less easily and quickly by speakers as part of their general vocabulary, we asked ourselves in Klosa-Kückelhaus and Wolfer (2020), whether neologisms borrowed from English are indeed accepted more slowly into the German general language than new German compounds and derivatives.

We presented data on the frequency development of 239 German neologisms of both types from the 1990s in a German reference corpus (DeReKo, Kupietz et al. 2018) based on the assumption that “an important diagnostic for the incorporation of a form into the native lexicon is the increased frequency of its usage” (cf. Poplack and Sankoff, 1984, 101, Backus, 2014, 25). We also studied the frequency development in the use of linguistic markers (“flags”, cf. Poplack et al. 1988, 1178, Palmer and Harris, 1990, Grant-Russell and Beaudet, 1999) with these words within a timeframe of roughly thirty years based on the assumption that the use of flags (such as quotations marks, or hedge words like so called) are abandoned once the process of lexicalization of new words has been completed (cf. Lemnitzer, 2010, 69). However, in the press texts in DeReKo, German neologisms are marked by quotation marks less often (possibly due to journalistic conventions), when the whole timeline from 1990 to 2017 is taken into account. Newer research (cf. Winter-Froemel, 2023) has shown that lexical borrowings are indeed typically less accessible (regarding their form and meaning), which is why speakers use flagging and frame information and, to a lesser extent, also metalinguistic comments to signal alterity.

We found no clearly distinguishable pattern in the frequency development for borrowed neologisms and those of German origin. Thus, we concluded that frequency development alone could (probably) not be used as an indicator for the acceptance of a neologism in German (at least not simple relative frequencies in a given timeframe). However, we found that although both borrowed new lexemes and new “native” word formation products are used with linguistic markers, borrowed neologisms are marked more frequently, especially when they first emerge. The use of these markers decreases over time so that after approximately three decades, there is hardly any difference left in the use of markers for borrowed neologisms or those formed (by composition or derivation) in German. We have also shown a negative correlation between overall corpus frequency and overall probability of flagging. We concluded that flagging can be used as an indicator of the acceptance of a neologism in German.

Our corpus-based study left several questions unanswered, and we discussed how to follow up, possibly by interviewing speakers and collecting their opinions on the acceptance of the analyzed neologisms in a field test or by psycholinguistic approaches. Similarly, Poplack et al. (1988) focused on the social correlates of lexical borrowing and found in their interview-based corpus study that younger speakers in the French-speaking areas of Canada tended to use English loanwords more than older speakers. Crombez et al. (2022) found in their research for the selection of Anglicisms or their Dutch alternatives that the oldest age group (between 51 and 70) was least likely to select the English lexeme in a forced-choice experiment. Soares da Silva (2014) presented data from a survey in which he looked at “how knowledge of the origin of words corresponds to actual language attitudes” (2014: 127). Contact linguistics, more generally, also have used cognitive approaches and focused on speakers and their role in the process of lexical borrowing (cf. Backus, 2020, Hakimov and Backus, 2021, Quick and Verschik, 2021).

With the present study, we aim to extend the methods used to explore the acceptability of borrowed vs. native neologisms and set up an experimental study in a mouse-tracking paradigm. We asked the participants for their assessment of whether certain words “are being used” and the mouse-tracking paradigm allows us to draw conclusions about the confidence associated with the individual assessments. We assume that greater uncertainty (or less confidence) is associated with a less “entrenched” mental representation of the respective word (for the concept of entrenchment, c.f. i.a. Langacker, 1987). From this, in turn, we try to deduce how strongly the word is integrated into the language as a whole.

In comparing neologisms of 25 to 30 years ago with other more recent ones, we accounted for the fact described above that the integration of borrowings into the lexicon of a language takes time. We specifically aimed at a wide age range of our participants to account for different attitudes and/or experiences towards borrowed neologisms. Finally, we are also interested in the question of converging evidence (cf. Schönefeld, 2011), i.e., to what extent results from corpus studies are generalizable to experimental paradigms of language processing.

Hypotheses

As indicated above, in our corpus study contrasting borrowed (“English”) neologisms with neologisms that consist solely of German source material (“German neologisms”), we found that English neologisms are flagged, i.e., linguistically marked, more often in the initial years after the neologism is introduced into the language. This difference becomes smaller and smaller the more time passes, that is, the longer the neologisms are in use. We have interpreted this to suggest that neologisms consisting of foreign language material take longer to be perceived by members of the speech community as an integral part of their own language. Moreover, we assume that the initial unfamiliarity with borrowed neologisms translates into uncertainty in decision-making during a behavioral experiment. Therefore, we assume an interaction effect (H3) for the present study, which we will describe below.

H1: Acceptance rates should be lower for borrowed (“English”) neologisms and, if participants accept the English neologism, reaction times should be higher than for German ones.

H2: We expect the uncertainty operationalized by the mouse trajectory variables to be higher for accepted English neologisms than for accepted German neologisms.

H3: The effects implied by H1 and H2 should be attenuated (or even absent) for neologisms from the 1990s because we expect them to be already entrenched in the German language, regardless of the source language.

Note that these hypotheses are based on some crucial assumptions that we want to state explicitly again: (1) less flagging is an indicator that speakers of a language accept a certain word more strongly as part of their language, (2) if a word is not yet fully perceived as part of the language, this leads to uncertainties when accepting the word in an experiment, and (3) these uncertainties can be captured in a mouse-tracking paradigm.

Method

Participants

We collected data from 80 participants between July and December 2022.Footnote 1 We recorded the year of birth and whether German was their first language. Gender was not recorded because we did not have any gender-related hypotheses and we wanted to collect as little personal data as possible. For the current study, we excluded all 13 participants who reported that German was not (one of) their first language(s). Due to a technical problem that resulted in no data being recorded, we had to exclude one participant. Thus, the data analyzed here is based on the responses of 66 participants. The age of the participants ranges from 15 to 85 years (mean: 36.7 years, median: 25.5 years). This shows that there is a slight imbalance towards younger participants. In what follows, we will differentiate into participants born before 1980 and 1980 or later. In this way, we want to distinguish between participants from Generation Y (also called “millenials”) and earlier generations. Forty-four participants were born in 1980 or later and 22 were born before 1980.

Stimuli and design

As linguistic stimuli, we chose neologisms recorded in a German dictionary on neologisms (Leibniz-Institut für Deutsche Sprache, 2006, cf. Steffens, 2017) from the 1990s and the 2010s either borrowed from English or formed from German material. We only chose nouns because all other parts of speech represent only a little more than 10% of the entries in the above-mentioned neologism dictionary. All words formed in German were compounds; most of the English loanwords were compounds as well (words like English touch screen are conceived as one word in German and spelled without blank), but due to limited choice of candidates in the neologism dictionary, some were also derivatives. All of the English test words are attested in English and not formed in German only (so-called “pseudo anglicisms”). We chose a range of test words within similar frequency ranges in the German reference corpus, DeReKo. However, corpus frequency was also entered as a covariate into the statistical models (see Section “Model fitting”).

Each participant saw 24 items (= words) in 6 experimental conditions. This is a total of 144 experimental runs per participant. The six experimental conditions result from a 2 × 3 design in which the factors of time/status (1990s, 2010s, Pseudo) and origin (English vs. German) were crossed. In addition to the visual presentation, we played an audio recording of each word.Footnote 2

The factor “Time/status” varied the century (1990s vs. 2010s) when the respective neologisms became established in the German language (as recorded by the neologism dictionary). In addition, we presented pseudo neologisms (factor level “Pseudo”) that sound like neologisms and follow the same word formation rules but are not attested in corpora of the German language. This was necessary because the participants also had to have the chance to reject some words during the experiment (see Section “Procedure”). Otherwise, the participants could have simply always given a positive answer, which could have distorted the results. See Appendix A for a list of the stimulus words. We provide literal English translations for German neologisms.

The factor “Origin” varied whether English or German source material was used when forming the new lexical items. Note that we presented all compounds according to German orthography, i.e., written in one string (without spaces between the compound parts), in four instances with a hyphen as connecting element (Bubble-Tea, Burn-out, Cross-fit, and No-Stream-Area). Table 1 gives an overview of the experimental design with three example stimuli for each cell. In some instances, we will later refer to the factor “Time/status” as “Time” only because we exclude the pseudo neologisms from the analyses. For the sake of simplicity, we will refer to the neologisms that are a product of word formation with German elements only as “German neologisms” and to the ones with English source material as “English neologisms”, at the same time being fully aware that these so-called “English neologisms” are a part of the German language.

Table 1 Design of the study crossing the two factors time/status (3 levels) and origin (2 levels).

Since the experimental items varied between all conditions, we presented all items to all participants. Each experimental list was randomized, and we made sure that a maximum of two consecutive trials belonged to the same condition.

Procedure

The participants were seated in front of a 15-inch Windows 10 notebook with a cable-based mouse, a screen resolution of 1920 by 1080 pixels, 8 GB of memory, and an Intel i5-6200U CPU. Participants used headphones to listen to the recording of the words. We used the same setup for all participants. To record the mouse movements, we used the free software MouseTracker (Freeman and Ambady, 2010). Participants operated the mouse with the hand they would normally use to operate a computer mouse.

The first screen of the experiment was the instruction. Here, participants were introduced to the experimental paradigm and were given the question they should focus on during the experiment: Ist das ein Wort, das verwendet wird? (Engl.: “Is this a word that is being used?”)Footnote 3. After the instruction, participants could ask the experimenter questions to clarify any uncertain points.

Following a button press, four training itemsFootnote 4 were presented, which followed the same procedure as the experimental items: first, only a box labeled “START” was presented on the bottom center of the screen. Upon clicking this box, the two response buttons labeled “JA” (‘yes’) and “NEIN” (‘no’) were presented in the upper left and right corners of the screens and mouse trajectory recording was started. The position of “JA" and “NEIN” was switched for every other participant of the experiment to avoid systematic position biases. The stimulus word was presented at the same time as the response buttons in the center of the screen and an audio file with the pronunciation of the word was played simultaneously. Mouse trajectory recording stopped as soon as the participant gave a response. The response and overall reaction time was recorded. After the four training items, another screen notified the participants that the experiment is now starting. On this screen, we repeated the focus question and again gave them the opportunity to ask questions about the course of the experiment. Another button press started the experiment. After half of the experimental trials, we inserted a screen announcing a short break. After another button press, the experiment continued with the second half of the experimental items.

Results

We will first go into some detail regarding the selection and prior processing of the response data. Here, we will also describe the dependent variables we base or analyses on. In Section “Model fitting”, we will describe the model-fitting process.

Data processing and selection

First, all trials with an initiation time (the time in this trial before the participant started to move the mouse) higher than 1 second were excluded (4.55%Footnote 5 of all trials). From this dataset, all trials with a log-transformed reaction time outside the range of the mean log reaction time ± three standard deviations were excluded. This affected 1.26% of the experimental trials with an initiation time of 1 second or shorter. The final dataset contained 8958 trials. Further outlier corrections specific to certain dependent variables are reported at the relevant place.

In the analyses reported below, we restrict the dataset to non-pseudo words only because, as we described in Section “Stimuli and design”, we have included the pseudo neologisms only because participants had to have the opportunity to correctly reject some of the stimulus wordsFootnote 6. The pseudo neologisms were rejected in 76.5% of all trials (73.2% for English, 79.7% for German), “real” neologisms were rejected in 27.2% of all trials. The dataset without responses to the pseudo neologisms consists of 5988 trials. This excludes two cells from our experimental design and we thus refer to the factor “Time/status” as “Time” (1990s vs. 2010s) from now on.

We processed the mouse trajectory data with the R (R Core Team, 2022) packages Readbulk (Kieslich and Henninger, 2016) and Mousetrap (Wulff et al. 2021) and followed the steps described in Wulff et al. (2021), namely remapping the trajectories, aligning them to a common start as well as time- and space-normalizing them. We used the default values for all respective functions.

The dependent variables we will be analyzing in the remainder of this section are acceptance of the stimulus word (binary yes/no responses), log-transformed reaction times, flips (directional changes of the trajectory) on the x-axis, the maximum absolute deviation (MAD) defined as the maximum value of all Euclidian distances from a hypothetical direct response path, the ideal(ized) mouse trajectory, to each of the 101 time-normalized points of the trajectory (cf. Koop and Johnson, 2013, 158) as well as the average deviation (AD) which is the mean of all the aforementioned distances. Results for the area under the curve (AUC) which is defined as “the area between the observed and idealized trajectory” (Wulff et al. 2021, 9) are reported in the Supplementary Material because the effect pattern is very similar to AD. Flips on the x-axis are considered a complexity index, whereas MAD is a single-point measure among the curvature indices. AD and AUC integrate deviations for all points in the trajectory and are thus called integrative curvature indices (cf. Wulff et al. 2021, 9). The dependent variables are correlated with each other, but not to the extent that they would measure exactly the same. The highest Spearman correlation can be observed for MAD and AD (ρ = 0.920), the lowest for reaction times and AUC (ρ = 0.026).

Model fitting

All statistical models were fitted using the R package lme4 (Bates et al. 2015a). For each dependent variable, we started with a full model, containing random intercepts for participants and item as well as by-participant random slopes for trial position. The fixed effect structure contained all possible interactions (including the 3-way interaction) between our two experimental factors “Time” and “Origin” as well as the age group (born before 1980 vs. born in 1980 or later). Additionally, the length and (log-transformed) corpus frequency of the stimulus word (hereinafter referred to as “log frequency”) as measured in the German Reference Corpus DeReKo were entered as fixed covariates. Trial position, word length, and log frequency were scaled and centered. Models were then reduced to a final model, which only included predictors and covariates that contributed to the goodness-of-fit of the model.Footnote 7 Model comparisons were based on likelihood ratio tests. For the final models, we report fixed effect estimates with 95% confidence intervalsFootnote 8 (CIs) as well as random effect variances and marginal and conditional R2s given by the sjPlot (Lüdecke, 2022) R packageFootnote 9. The former measures the amount of variance that is explained by fixed effects alone, the latter includes random effects in the calculation. We use the ggplot2 (Wickham, 2016) and ggeffects (Lüdecke, 2018) R packages to visualize the model estimates. We do not include covariates in the model plots.

Acceptance rates

No additional outlier correction was carried out for the logistic mixed-effects regression model of stimulus word acceptance. The final model contains random intercepts for stimulus word and participant as well as a fixed effect for log frequency. The three-way interaction between time, origin, and age group significantly contributes to the model fit. Consequently, we kept all lower-order interactions and single effects of these predictors. Table 2 summarizes the model results.

Table 2 Model results for the mixed-effects logistic regression model predicting acceptance of the stimulus word (CI = confidence interval).

The log frequency covariate indicates that stimulus words that are more frequent in the corpus have a higher chance of being accepted. Since the three-way interaction is included in the final model, we directly refer to the model plot (see Fig. 1) for interpretation. Altogether, German neologisms have a lower estimated probability of acceptance—an effect that is especially pronounced for the younger participants and only slightly attenuated for neologisms from the 1990s and older participants. Also, English neologisms from the 2010s are slightly less acceptable for older participants than the ones from the 1990s.

Fig. 1: Model plot for the final mixed-effects logistic regression model predicting stimulus acceptance.
figure 1

Predictions for the two participant groups are distributed over panels. The experimental factor “Time” is coded on the x-axis. The experimental factor “Origin” is color-coded. Error bars symbolize 95% confidence intervals. This is the case for all model plots.

Reaction times

No additional outlier correction was carried out for the linear mixed-effects regression model of log-transformed reaction times. However, we did restrict the analysis of reaction times to trials where the participant accepted the word only.Footnote 10 There are two reasons why we did this (not only for reaction times but for all following dependent measures, too): (1) It is reasonable to assume that reaction times from trials where the word has been accepted are systematically different from the trials where the word has been rejected (cf. Proctor et al. 1984). Including them all in one model would thus make interpretation considerably more complicated because (2) all effects would be modulated by accepting/rejecting the word, which in turn would lead to higher-order interactions in the models. Indeed, tests we carried out beforehand showed exactly this. The number of trials included in the calculation of the model for the reaction times, therefore, decreases to 4362.

Table 3 shows that the covariate effects for word length (the longer the word, the higher the reaction time), log frequency (the more frequent the word, the lower the reaction time), and trial position (the later in the experiment, the faster the reaction time, but note that trial position is also included as a by-participant random slope) are included in the final model. Also, two 2-way interactions and, thus, the lower-order single effects are included.

Table 3 Model results for the linear mixed-effects regression model predicting reaction times in trials where the stimulus word was accepted.

When we refer to Fig. 2 for the interpretation of the remaining factors, we see that the younger participants took (around 400 milliseconds) longer to accept German neologisms than English neologisms. This effect is also hinted at for the older participants, but the differences for those are much smaller and—as the interaction suggests—should not be interpreted. The Time × Age group interaction suggests that the effect of the decade when the neologism was first observed differs between older and younger participants. However, pairwise comparisons do not suggest any interpretable differences.

Fig. 2: Model plot for the final linear mixed-effects regression model predicting log-transformed reaction times.
figure 2

All variables are coded as in Fig. 1.

Flips on the x-axis

For each of the dependent measures based on the mouse trajectories, we employed an outlier correction for the respective measure only. With this, we wanted to make sure that no completely unreasonable trials enter the analysis. We treated all data points outside the mean value ±3 standard deviations as outliers (note that we did the outlier correction after excluding pseudo-neologism and rejection trials). For the flips on the x-axis, this affected 45 (1.03%) data points. About 4317 trials enter the analysis of the number of flips on the x-axis for trials where the word was accepted.

Table 4 shows that more frequent words were associated with less flips. The trial position only remained in the model as a fixed effect because its inclusion as a by-participant random slope improved the model fit. Crucially, no effect of “Time” remains in the final model, but the two-way interaction between “Origin” and “Age group” (as well as their lower-order single effects) contribute to explaining variance in the number of flips.

Table 4 Model results for the linear mixed-effects regression model predicting flips on the x-axis in trials where the stimulus word was accepted.

Fig. 3 shows that this interaction can be attributed to the younger participants, who change direction on the x-axis (around 1.3 times) more often when accepting German neologisms than English ones—an effect which cannot be observed for the older participants.

Fig. 3: Model plot for the final linear mixed-effects regression model predicting the number of flips on the x-axis.
figure 3

Predictions for the two participant groups are coded on the x-axis. The experimental factor “Origin” is color-coded.

Maximum absolute deviation (MAD)

For the MAD, 43 (1.05%) data points were identified as outliers. 4359 trials enter the analysis of MADs for trials where the word was accepted. Only the trial position remained as a covariate in the final model for the MAD (see Table 5 for the model results). The later in the experiment, the higher the MAD was overall. Also, the trial position remained in the model as a by-participant random slope.

Table 5 Model results for the linear mixed-effects regression model predicting the MAD in trials where the stimulus word was accepted.

The remaining fixed effects roughly follow the same pattern as for the number of x-axis flips. Figure 4 again shows that we did not find any reliable difference between the MADs for English and German neologisms for the older age group. However, the MADs for the German neologisms are, again, considerably elevated for the millenials.

Fig. 4: Model plot for the final linear mixed-effects regression model predicting the maximum absolute deviation from the optimal mouse trajectories.
figure 4

All variables are coded as in Fig. 3.

Average deviation

For the AD, 88 (2.02%) data points were identified as outliers. About 4274 trials enter the analysis of ADs for trials where the word was accepted. The length of the stimulus word was associated with smaller AD values. The trial position only remained in the model as a fixed effect because its contribution as a by-participant random slope was relevant. As Table 6 and Fig. 5 show, the overall effect pattern again replicates the ones for x-axis flips and MADs: there is no observable difference for the older age group, but the younger participants show elevated ADs for German neologisms. The decade the neologism came into use in the German language was not included as a predictor in the final model.

Table 6 Model results for the linear mixed-effects regression model predicting the AD in trials where the stimulus word was accepted.
Fig. 5: Model plot for the final linear mixed-effects regression model predicting the average deviation from the optimal mouse trajectories.
figure 5

All variables are coded as in Fig. 3.

Discussion

A reasonably clear picture of the effects of the experimental predictors can be seen. The decade when the neologisms had been introduced into broader use within the speech community (factor “Time”) only has an effect on acceptance rates: older participants tend to accept older (from the 1990s) English loanword neologisms more often than newer ones (from the 2010s). This is partially consistent with our hypotheses, although we did not formulate a hypothesis specific to the age group of the participants. In general, younger generations (who claim not to be bothered by English influence) seem to be more accepting of the use of English neologisms in Dutch (cf. Crombez et al. 2022, 998). So, for further studies, the age of participants should definitely be taken into account. Apart from that, the decade when neologism was introduced does not show any effects on any of the dependent variables. The two-way interaction effect between time and age group for reaction times proved too inconsistent to yield any interpretable effects.

The other experimental factor, “Origin”, however, shows significant influences on all dependent measures. Overall, participants rejected native, i.e., German, neologisms more often, and they responded more slowly to them. Also, all mouse trajectory variables (x-axis flips, MAD, AD, and AUC) indicate that uncertainty in the decision to accept these neologisms was higher. It is important to note that, apart from the acceptance rates, this is only true for the younger participants (the millenials), as indicated by the interactions present in all the trajectory variables. No effects of origin could be observed for the participants born before 1980.

Higher rejection rates, slower responses, and more acceptance-related uncertainty all stand in sharp contrast to the hypotheses we derived from our corpus-linguistic study (Klosa-Kückelhaus and Wolfer, 2020). There we found, in a nutshell, that borrowed neologism are flagged (i.e., linguistically marked with accompanying explanations or quotation marks) more often than German ones when they are introduced into the language. This difference, however, gets smaller as time progresses. In other words, apart from the effect of the factor Time on acceptance rates, none of the hypotheses we formulated is supported by the behavioral data. In fact, the contrast between native and borrowed neologisms is exactly opposite to what we expected, and we found an interaction different from the one we formulated in H3. And even though the lexical material used in the two studies differs considerably, the strength and robustness of the negative effects we found for German neologisms in the present study cannot be explained by differences in the investigated material alone. Therefore, we would like to discuss possible explanations for this discrepancy in what follows.

We want to start out with some anecdotal evidence we picked up while debriefing the participants. Some of them claimed that they were more “lenient” when accepting the borrowed neologisms than the German ones. One could also say that the borrowed neologisms enjoy the “benefit of the doubt,” and whether they are rejected or accepted is guided by the general attitudes of the individual participants (where the age effect is observable). As soon as they had the feeling that they had ever heard or read a borrowed neologism, they accepted it without giving it too much thought (that, by the way, is the behavior we were trying to elicit with our instruction). On the other hand, when they encountered a German neologism, they were much more likely to (try to) recall the exact meaning of the word—an undertaking that might be especially difficult for neologisms. This additional cognitive effort might have induced the effects we are seeing in the present study. However, we did not systematically collect this type of feedback. Also, these are participants’ self-assessments, and we should be careful of taking these at face value.

Another explanation invokes a covariate that we have not previously considered. Possibly it is not the corpus frequencies that we should pay attention to, but the distribution of stimulus words across the corpus. If a word appears in many subcorpora, it might be known to more speakers because it is used more universally than a word that is equally (or more) frequent but appears only in very specific subcorpora (possibly indicating niche or terminological usage). Lijffijt and Gries (2012) propose deviations of proportions (DPnorm) as a measure that operationalizes this distribution of words over subcorpora. However, the average DPnorm values for the German and borrowed neologisms used here do not differ (English = 0.449, German = 0.447, t(80.0) = 0.0784, p = 0.938).Footnote 11 Also, the subject area to which the stimulus words belong (according to the neologism dictionary) has no influence on the reported effects: they are stable across all 19 subject areas.

Let us take a step back to our corpus study for one last potential explanation. In this study, we were able to find differences not only in terms of linguistic marking (see Section “Prior work” for details), but also in terms of frequency of the different groups of words. Although no overall difference was observable with respect to frequencies, the frequency progressions did differ significantly over time: borrowed neologisms remained at a similarly high-frequency level once they were introduced into the language. The average frequency of German neologisms, on the other hand, tended to decrease over time (see Fig. 3 in Klosa-Kückelhaus and Wolfer, 2020). In the present study, we controlled for overall frequency by including it as a covariate in the statistical models, but the detailed frequency progression in time is not captured by this covariate. As can be seen in Fig. 6, though, the frequency progressions, especially for neologisms from the 1990s, also differ for the stimulus words we used in the present study.

Fig. 6: DeReKo corpus frequency progressions for the stimulus words over the years (total corpus match count: 701,561 tokens).
figure 6

Panels distinguish between levels of the factor “Time”. The stroke color distinguishes between levels of the factor “Origin”. For each group, we aggregated the frequencies per year and, to correct for short-term fluctuations, calculated a rolling mean with a window size of 3 years.

It is not (yet) possible to say whether English neologisms from the 2010s (right panel of Fig. 6) will also frequency-wise “outperform” German neologisms in a similar way than for the neologisms from the 1990s (left panel). But given the behavioral results from the present study, frequency progressions might indeed be a better predictor than frequency alone or the presence of linguistic markers in written texts.

There are, however, two problems with this potential explanation. First, given the frequency progressions in Fig. 6, we would have expected a more pronounced effect for neologisms from the 1990s. We did not find any indication of such an interaction effect for any of the mouse trajectory variables. Secondly, this does not explain why we should observe such an advantage of borrowed neologisms for younger speakers only. This would suggest that younger speakers are more sensitive to frequency progressions than older ones. Even if such evidence would exist, it could not be separated from any other potential explanation. One such explanation could be, for example, that younger speakers simply are more exposed to the borrowed neologisms in their everyday lives and are thus more likely to accept them without giving their decision much thought (as opposed to the German neologisms they are not exposed to as frequently). Such an approach highlights how important the notion of “subjective frequencies” (Kuperman and Van Dyke, 2013) could be when interpreting the results of language processing studies.

To summarize, we come back to the assumptions we formulated in Section “Hypotheses”. Assumption 1 stated that the extent of flagging is an indicator of the degree of acceptance in the language. Given the results reported here, this conjecture must be questioned. The decision to linguistically flag a word is an individual decision of the respective author (see Winter-Froemel, 2023, 14–16 for a discussion of flagging as one way in which authors, i.e., journalists, choose to mark alterity and to enhance accessibility). Extrapolating this decision of an individual member of the speech community to the community as a whole may indeed be a misconception or too reductive to say the least. Assumption 2 stated that a neologism’s amount of integration into a language is associated with higher insecurities when accepting it in a behavioral experiment. However, if we assume, as participants’ self-assessments suggest, that different cognitive processes were triggered for native and borrowed neologisms (elaborated semantic retrieval vs. rapid matching with known forms), this assumption might also be too simplistic.

Conclusion and implications

Given the results of our study, we assume that, compared to older speakers, millenials are indeed more confident in accepting borrowed neologisms than native neologisms, possibly due to their group-specific language experience. How exactly this language experience (and language attitudes that emerge from it) can be reliably operationalized, and what role corpus-based measures like frequency trends (as opposed to overall frequencies), dispersion over subcorpora, and linguistic marking/flagging play in this endeavor remains a subject of further investigation. It might also be the case that English proficiency plays an important role in accepting neologisms formed with English material. We did not, however, collect English proficiency scores for our participants.

Moreover, due to the experimental paradigm used, we only presented isolated words. Whether the effects reported here similarly emerge for words in context (for example, Lombard et al. 2021 embedded French neologisms in sentences) also remains a question for future studies, which would then have to be realized in other experimental paradigms.