Language-Specific Non-Words for the Assessment of Working Memory: Dealing With Bilingual Children

Bilingual children with a limited command of the second language (L2) often yield unsatisfactory results in L2-based non-word repetition tasks (NWRT) for the assessment of working memory. In this study, monolinguals (MO) and bilinguals (BI) of preschool age acquiring German were compared in regard to their performance on German-based NWRT to choose test items that do not put BI at a disadvantage. Four-year-old children (N = 876) were tested with the German language screenings Sprachscreening für das Vorschulalter (SSV) and Kindersprachscreening (KiSS), including 38 non-words. BI scored significantly lower in NWRT than MO due to a limited command of German as well as a limited language input (e.g., length of kindergarten attendance). After the statistical deletion of children who did not speak German age-appropriately, BI outperformed MO on the SSV, without significant differences on the KiSS. Performance on NWRT depended on the children’s command of the German vocabulary and phonotactics. Several relatively “culture-fair” NWRT items were identified.

Performance on NWRT is predictive, among other things, of children's vocabulary acquisition (Marini et al., 2017), but in its turn depends on the vocabulary size because the long-term memory contributes to holding memory trace, a theoretical means by which memories are physically stored in the brain (Gathercole, 2006). Phonology knowledge develops with the number of acquired words, making phonotactic knowledge dependent upon the child's vocabulary size (Beckman et al., 2007). For instance, children with word learning difficulties such as those with SLI were shown to have a deficit in phonological representations and to score lower in NWRT than their typically developing peers (Befi-Lopes et al., 2010). In children with weak phonological processing, lexical-semantic knowledge supporting pattern completion within the phonological system is often activated to improve the performance on short-term memory tasks, including NWRT (Savill et al., 2019). Thus, children's acquired vocabulary forms and predetermines phonological representations and compensates for the weak phonological processing, when necessary.
Due to this link between working memory and vocabulary skills, children repeat non-words with a high lexical load more easily than items with a low load (Gathercole, 1995). For the same reason, BI score lower on items based on their L2 than MO on items based on L1, especially on tasks with a high semantic load (Yoo & Kaushanskaya, 2012). Not only do many non-words follow the typical syllable structure and accent patterns of the target language; some of them are also deliberately designed to contain language-specific morphemes or to rhyme with existing words. Such items therefore presuppose linguistic knowledge that many BI have not yet acquired, thereby putting them at risk for a low NWRT score and contributing to misidentification of language disorders.
Hence, in case of BI, crosslinguistic phonological and lexical transfers from their L1 are to be expected in L2-based NWRT (de Abreu et al., 2013). Therefore, the assessment of working memory by language-specific NWRT in BI is often criticized (e.g., Grimm, 2016). However, this critique did not change the daily clinical practice due to the lack of validated NWRT that would account for the phonotactic regularities across languages. As a possible alternative to the extremely challenging development and validation of such "universal" tests, relatively "culture-fair" non-words from existing language-specific tests can be selected, that is, nonwords that do not yield significantly different results for MO and BI.
A large number of variables related to the language input, including attendance of kindergartens, associations, and study groups, age of the onset of language acquisition or language use at home, has been shown to be associated with children's language competence (Zaretsky et al., 2020). Therefore, a link between these variables and children's performance on language-specific NWRT can be expected in terms of a higher quality and quantity of the language input associated with higher scores of correct results in NWRT.
This study aimed (a) to examine whether BI of preschool age acquiring German are put at a disadvantage by commonly used German-based non-words, (b) to correlate children's performance on NWRT with the command of the German phonotactics and vocabulary, but also quality and quantity of the language input, and (c) to select items that are less influenced by German language skills and, thus, less biased. In a search for the most appropriate Germanbased non-words, BI and MO were compared in regard to their performance on two NWRT under the two following conditions: (a) all children irrespective of their German language skills and (b) only children with age-appropriate German language skills. For (a), it was hypothesized that some German-based non-words would put BI at a disadvantage due to their limited command of the German language. On the contrary, for (b), no significant differences between BI and MO were expected.

Test Subjects and Test Battery
In this retrospective study, language test results of 876 children (479/54.7% male, 397/45.3% female; age 4 years 0 months to 4 years 11 months, age mean and median 51 months) from 154 German kindergartens were utilized. Originally, these data were collected for the purpose of validation of various language tests. More than half of the children were monolingual Germans (n = 520/59.4%), all other acquired one more language (n = 356/40.6%), most often Turkish (n = 65/7.4% of the sample), Arabic (n = 39/4.5%), and Russian (n = 36/4.1%). There was no statistically significant age difference between BI and MO according to a Mann-Whitney U test.
All children were tested with the language screening "Sprachscreening für das Vorschulalter" (SSV; Grimm, 2003). SSV includes the following 18 German-based non-words: Billop, Kalifeng, Defsal, Ronterklabe, Toschlander, Entiergent, Gattwutz, Glösterkeit, Dilecktichkeit, Krapselistong, Nebatsubst, Seregropist, Skatagurp, Waltikosander, Pristobierichkeit, Kabusaniker, Ippazeumerink, Vominlapertust. Additionally, all children were tested with the second version of the language test "Kindersprachscreening" (KiSS; Holler-Zittlau et al., 2011), with subtests on speech comprehension, vocabulary, phonology, grammar, and working memory. KiSS categorizes children as (a) speaking German age-appropriately, (b) needing additional educational support (ED), and (c) needing additional medical support (MED) in acquiring German. In terms of the international CATALISE study, ED children are those who had insufficient exposure to the language used by the school or community to be fully fluent in it and who do not have a language disorder (Bishop et al., 2017). MED children are those with language impairments (Bishop et al., 2016). Thus, ED children need more language input (e.g., a German language reinforcement course offered by many kindergartens), whereas MED children need some kind of medical assistance that would contribute to the development of their L1 or L2 competence, such as hearing aids in case of a hearing disorder. Deficits in working memory are characteristic of MED children (see Introduction), but not ED ones.
In KiSS, the same cut-off values are applied to the classification of BI and MO as ED. However, KiSS does differentiate between cut-off values for BI and MO in the MED classification, with lower values for BI. According to KiSS criteria, 240 children (27.4%) from the sample were classified as ED and 110 (12.6%) as MED. There was an overlap between these two groups because most of MED children were also classified as ED (n = 83/9.5%). All other children (n = 609/69.5%) spoke German age-appropriately.
Because KiSS.2 contains only four non-words (Triser, Misküranok, Nabolira, Verklasenaft), direct comparisons with SSV (18 items) were hardly possible. For such comparisons, a subsample of children who were tested with an extended set of the following 20 non-words was utilized (Neumann & Euler, 2009): four items mentioned above and Gune, Lamposta, Kaulen, Meling, Flome, Siraf, Schabu, Kneefode, Feilister, Posalin, Verwumpeln, Gestrabelt, Schrimane, Kirumander, Wasubritzen, Berofterheit. This set of items was a part of a previous KiSS version and later was reduced to four items. A total of 201 children were tested with this extended KiSS version (101/50.2% male; 100/49.8% female; 100/50.2% BI, 101/49.8% MO; age 4 years 0 months to 4 years 11 months; mean and median age 51 months), without significant age differences between BI and MO. These children did not constitute a separate sample but, due to identical KiSS subtests (except NWRT), were a part of the sample described above.
In addition to the language screening and NWRT, KiSS contains questionnaires for parents and kindergarten teachers, with items that can be categorized as follows: (a) demographic variables such as birth date, (b) quality and quantity of the language input in German and (an)other language(s) (e.g., age when the child began to acquire German), (c) children's and parents' language-related medical issues (e.g., language disorders in the family), and (d) participation in language courses and therapies.

Statistical Analysis
As it was hypothesized that weaker German language skills of BI would find their reflection in the weaker performance on both NWRT, first, German language skills of BI and MO were compared. The classification BI/MO was cross-tabulated with the KiSS classifications as ED and MED, including a chisquare calculation. Performance of BI and MO on KiSS subtests on vocabulary and phonology-two linguistic domains that were shown to be associated with the results of NWRTwere compared by Mann-Whitney U tests.
To analyze the association of the German language skills with the performance on NWRT, two classification trees were calculated (method "Exhaustive CHAID"), with total scores of correct answers in SSV and KiSS NWRT as dependent variables and (a) children's age in months, (b) classification BI/MO, (c) KiSS vocabulary score, and (d) KiSS phonology score as independent variables. For the same purpose, Spearman correlations between both NWRT and KiSS subtests on vocabulary and phonology were calculated.
NWRT from SSV and KiSS were analyzed in respect to possible disadvantages for BI compared with MO. Total scores of correct answers in both NWRT were compared in two Mann-Whitney U tests for BI and MO. Next, rates of correct and wrong answers in each non-word were compared for BI and MO in cross-tables with a chi-square calculation. Also, several selected consonant clusters in non-words (e.g., Seregropist) were analyzed by the same statistical method in regard to the omission of consonants in BI and MO subgroups. Such omissions were expected to occur more frequently in the BI subgroup due to a limited command of the German phonotactics and influence of other languages.
Associations between total scores of correct answers in the NWRT from SSV and children's sociodemographic characteristics were assessed by correlations. For these calculations, some KiSS questionnaire items on the quality and quantity of the L1 and L2 input were utilized (see Table 1). For ordinal variables (e.g., how often the child plays with German-speaking children), Spearman correlations were calculated. For dichotomous variables (e.g., whether the child attended a nursery school), point-biserial correlations were used. It was hypothesized that a link between performance on NWRT and sociodemographic variables related to the L1 and L2 input would be stronger in case of BI than in case of MO. The sample sizes for KiSS were too low, so that KiSS had to be excluded from these correlational analyses.
Next, performance of BI and MO on NWRT under the condition of comparable German language skills was assessed. All children who were classified as ED and MED were excluded. BI and MO who spoke German age-appropriately did not differ in respect to their age in months. Mann-Whitney U test was utilized to verify whether significant differences between BI (n = 172) and MO (n = 417) in the total scores of correct answers in the KiSS subtest on the vocabulary and phonology still existed after the exclusion of ED and MED children. Next, total scores of correct answers in both NWRT were compared by two Mann-Whitney U tests for BI and MO. BI were expected to catch up with MO. Possible disadvantages for BI in each of the SSV and KiSS non-words were assessed by cross-tables between rates of correct and wrong answers in NWRT and the BI/MO classification. Also, calculations of cross-tables between the BI/MO classification and variables on the omission of consonants in consonant clusters were repeated.
Because of a very heterogeneous constellation of the BI subgroup, with dozens of ethnic and linguistic backgrounds, a considerable stratification regarding performance on NWRT could be assumed. These differences were exemplified here by a comparison of BI speaking European (n = 172) and non-European (n = 156) languages. For 28 BI, no reliable information on their L1 was available. Therefore, they were not included in any of these subgroups. Both "Europeans" and "non-Europeans" were on average 51 months old, without significant difference in age. It was hypothesized that the "non-Europeans" would yield lower total scores of correct answers in NWRT due to a larger typological difference between German and their L1 than in case of the "Europeans." Again, after the comparison of the percentages of ED and MED children in these two subgroups in cross-tables, Mann-Whitney U tests were calculated to assess differences between "Europeans" and non-Europeans" in the total scores of correct answers in the SSV and KiSS NWRT. After the exclusion of all ED and MED children, Mann-Whitney U tests were repeated. Also, the omission of consonants in consonant clusters was compared for "Europeans" and "non-Europeans" in cross-tables before and after the exclusion of ED and MED children. Because both subgroups were BI, it was not expected that "non-Europeans" would outperform "Europeans" after the exclusion of children with limited German language skills. For all mean values (M), standard deviations (±) were given. The effect size in all Mann-Whitney U tests was quantified with the probability of superiority index (p; Grissom & Kim, 2012). Values close to .5 are considered as low, values close to .0 as high.
Next, the hierarchy of factors of influence for SSV and KiSS NWRT results was assessed by two classification trees. For the total score of correct answers in non-words from SSV, the children's phonology skills in German, quantified by KiSS, were found to be the most important factor, F(3, 872) = 58.5, p < .001. Children with total scores < 5 repeated, on average, 5.4 ± 3.4 non-words correctly, those with total scores 5-7 repeated 7.6 ± 3.6 nonwords, those with 8-9 correct answers repeated 9.4 ± 3.5 non-words, and children with more than 9 correct answers repeated 11.0 ± 3.6 non-words. For the latter two subgroups, the next important factor was the KiSS subtest on vocabulary, F(1, 416) = 12.5, p < .001 and F(1, 275) = 22.2, p < .001, with higher vocabulary scores being associated with better NWRT results. Other factors did not appear in the classification tree. Questionnaire items ρ or r pb n QK: Length of kindergarten attendance: 0-14 vs. 15+ months r pb =.17*** 261 QK: Attendance of a nursery school in the first 2 years of life (yes/no) r pb = −.18** 248 QK: Regular or irregular kindergarten attendance r pb = -.16* 172 QK: Kindergarten attendance for half a day or full day r pb = .14 183 QK: The child speaks out when playing (never, seldom, sometimes, often, always) ρ = .23** 184 QK: There is at least one more child speaking the same non-German language in the kindergarten group (yes/no) For KiSS, only one factor was found to be associated with the total score of correctly repeated non-words, F(1, 199) = 31.7, p < .001: Children with a KiSS vocabulary score < 5 yielded, on average, 12.7 ± 4.2 correct answers in this NWRT, and children with a vocabulary score 5+ yielded 15.7 ± 3.2 correct answers.
The same findings-a high association of KiSS phonology score with the performance on SSV NWRT and a high association of KiSS vocabulary score with the performance on KiSS NWRT-were confirmed by Spearman correlations (ps < .001). The total score of correctly repeated SSV non-words correlated with a KiSS vocabulary score lower than with the KiSS phonology score (ρ = .31 vs. .42), whereas a total score of correctly repeated KiSS non-words yielded a higher correlation coefficient for the vocabulary than for the phonology subtests (ρ = .40 vs. .26).
A link between the performance on NWRT and sociodemographic characteristics of the children was assessed separately for BI and MO. Several questionnaire items were correlated with the total score in SSV NWRT. In case of MO, only kindergarten attendance for half a day vs. full day yielded a significant result (a higher NWRT score in the latter case): r pb = .15, p = .026, n = 218. Results of BI are given in Table 1. In all correlations, a larger quantity and/or quality of the German language input was associated with a higher NWRT score.

Discussion
Although NWRT in numerous language tests, including SSV, were designed to identify children with language-related medical issues, the respective cut-off values often implicitly presuppose monolingual background. Performance on working memory tasks depends on the command of and input in the respective language (Szewczyk et al., 2018). In case of BI, limited German language skills and influence of the L1 phonotactics can result in low total scores of correctly repeated non-words even in children without any languagerelated impairments. As a consequence, BI can be misidentified as having a language impairment and prescribed unnecessary medical examinations and therapies.
This study aimed to analyze differences in the repetition of language-specific (here, German-based) non-words by MO and BI living in Germany, including factors of influence related to the language input, and to find the most appropriate, that is, "culture-fair" items for the assessment of working memory in BI. As expected, BI scored significantly lower than MO in both chosen NWRT (those from SSV and KiSS), probably (a) because they needed educational assistance in acquiring German significantly more often than MO and (b) due to transfers from their non-German L1. After the exclusion of children with limited German language skills, BI outperformed MO in SSV NWRT and scored on the same level as MO in KiSS NWRT.
Weaker results of BI, compared with MO, in NWRT can be explained in terms of BI's weaker command of the German phonotactics and vocabulary. BI not only scored lower in the respective KiSS subtests but also tended to simplify consonant clusters in the non-words more often than MO. Consonant clusters are highly language-specific and therefore are sometimes criticized as inappropriate for NWRT (Grimm, 2016). Because most non-words in SSV and KiSS contain consonant clusters, apart from Germanspecific phonemes such as /ʃ/, /ʔ/, and /ŋ/, this might have contributed to lower scores of correct answers in NWRT in the BI subgroup compared with MO. BI scored significantly lower in 6 out of 18 SSV items and in five out of 20 KiSS items. It should be noted that in some alternative NWRT such as "quasi-universal" non-words suggested by Chiat (2015) consonant clusters do not occur at all to avoid this disadvantage caused by consonant clusters for BI.
The dependence of NWRT results on German language skills was confirmed by significant associations between performance on SSV and variables related to the quality and quantity of the German language input. In case of MO, only one very weak correlation was identified: children attending kindergartens for a full day yielded higher scores of correctly repeated non-words than children attending kindergartens only for half a day. In case of BI, most correlations demonstrated significant results. Children with lower total scores of correct answers in NWRT attended kindergartens, nursery schools, associations, and study groups less often than children with higher total scores. They played less often with German-speaking children both in kindergartens and beyond and spoke out less often when playing. A late onset of German language acquisition was also associated with low results in NWRT. To sum up, in contrast to MO, BI with a limited German language input beyond their household indeed scored lower in NWRT although such tasks were actually designed to assess not German language skills but working memory.
After the exclusion of ED and MED children, BI still scored significantly lower than MO in the vocabulary task but outperformed MO in SSV non-words and performed on the same level as MO in KiSS non-words. As was shown in classification trees and confirmed by Spearman correlations, the SSV NWRT total score was most closely associated with the total score of correct answers in the KiSS subtest on phonology, and KiSS NWRT total score with the KiSS subtest on vocabulary. Because BI still lagged behind in the command of the German vocabulary, but not in phonology, even after the exclusion of ED and MED children, they could not outperform MO in KiSS non-words. However, a comparable level of command of the German phonotactics contributed to the better result of BI, compared with MO, in SSV non-words. Additionally, the rates of omission of consonants in consonant clusters in the BI and MO subgroups became comparable. The exclusion of ED and MED even resulted in higher rates of correct answers in three non-words in the BI subgroup, compared with MO, although only in the items where differences were not statistically significant before the exclusion of children with weak German language skills. MO still scored higher than BI in two out of 38 items.
The finding that BI with normal language skills in L2 can score in working memory tasks on the same level with MO supports results of some previous studies on other language tests. Schöler et al. (2005) found no significant differences between BI and MO in NWRT from the German tests "Heidelberger Auditives Screening in der Einschulungsuntersuchung" (HASE; Brunner & Schöler, 2001) and "Bielefelder Screening zur Früherkennung von Lese-Rechtschreibschwierigkeiten" (BISC; Jansen et al., 2002). In some comparatively rare studies, BI even outperformed MO in repetition tasks (Bastian et al., 2018). Because working memory is trainable (Lee et al., 2017), the double pressure imposed by bilingualism on executive functions can obviously contribute to a better development of its subsystems including working memory. In other words, significantly better results of BI, compared with MO, after the statistical deletion of test subjects with limited German language skills in the current study can be attributed to more developed working memory due to the acquisition of two or more languages (Blom et al., 2014).
BI do not constitute a homogeneous group in regard to the quality and quantity of the contact to the German language, and, consequently, in regard to their German language skills. In previous research, children from non-European countries, such as Arabs and Turks, were shown to acquire German under unfavorable sociodemographic conditions and to have a limited command of the German language, compared with Greek and English-speaking children (Zaretsky & Lange, 2015;Zaretsky et al., 2013). In this study, children speaking European languages also outperformed "non-Europeans" in NWRT, probably due to better German language skills and comparatively small typological differences between their L1 and German. After the exclusion of ED and MED children from both subgroups, no significant differences between "Europeans" and "non-Europeans" in NWRT were found. Omissions of consonants in consonant clusters still occurred more often in the non-European subgroup, but only in one word, compared with three before the exclusion of children with a weak command of German.
Most effect sizes in the comparison of chosen subgroups-BI vs. MO, "Europeans" vs. "non-Europeans"were low. The same is valid for the correlations between NWRT results and questionnaire items on the quality and quantity of the language input (see Table 1). Indeed, numerous intra-and extralinguistic variables, such as length and phonological complexity of test items (Cilibrasi et al., 2018), children's biological sex (Lange & Zaretsky, 2021), and familial socioeconomic status (Korecky-Kröll et al., 2019), just to name a few, contribute to the performance on NWRT. Obviously, bilingualism is only one of many factors influencing the results of working memory tests.
Performance on tests on working memory in the BI subgroup resembled, to a certain degree, performance of children with SLI from other studies. Phonological representations have not yet been completely formed for L2 in BI and are deficient in children with SLI due to word learning difficulties (Szewczyk et al., 2018). Just like test subjects with SLI (Jones et al., 2010), BI from the current study scored low in the reproduction of nonwords with a high lexical load: items with German morphemes (suffix -keit in Dilecktichkeit) and those with a high probability of re-etymologization (Toschlander, cf. popular sausage trademark Deutschländer). In MO, such items activate frequently used phonological representations that do not exist in BI yet.
The following non-words were shown to be comparatively "culture-fair," that is, they put neither BI nor MO at a disadvantage and, thus, probably measured working memory more than German language skills: (a) SSV: Billop, Defsal, Ronterklabe, Entiergent, Krapselistong, Nebatsubst, Seregropist, Pristobierichkeit, Ippazeumerink, (b) KiSS: Gune, Lamposta, Kaulen, Triser, Siraf, Schabu, Feilister, Posalin, Verwumpeln, Gestrabelt, Schrimane, Kirumander, Wasubritzen, Misküranok, Nabolira. One of the limitations of the study is that BI's competence in their non-German L1 was not assessed. Such tests were not available for most languages spoken by children in the sample and also were not of interest for the original test validation studies that provided data for the retrospective analyses presented here. Therefore, it remained unknown how many children lagged behind in the acquisition of their non-German L1, which would have been an important indicator of language impairments.
To conclude, NWRT delivered in case of BI misleading results. Insufficient German language skills and transfers from L1 impede the interpretation of NWRT results and interfere with a reliable identification of MED children. Numerous German-based non-words from the two chosen tests were shown to put BI at a disadvantage. Therefore, the use of "quasi-universal" non-words such as those proposed by Chiat (2015) is recommended instead of language-specific ones. However, "quasi-universal" items are not yet validated for German, which should be done in future research. Alternatively, comparatively "culture-fair" items such as those identified in this study should be included in working memory tests.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.