A behavioural exploration of language aptitude and experience, cognition and more using Graph Analysis

Language aptitude has recently regained interest in cognitive neuroscience. Traditional language aptitude testing included phonemic coding ability, associative memory, grammatical sensitivity and inductive language learning. Moreover, domain-general cognitive abilities are associated with individual differences in language aptitude, together with factors that have yet to be elucidated. Beyond domain-general cognition, it is also likely that aptitude and experience in domain-specific but non-linguistic fields (e.g. music or numerical processing) influence and are influenced by language aptitude. We investigated some of these relationships in a sample of 152 participants, using exploratory graph analysis, across different levels of regularisation, i.e. sensitivity. We carried out a meta cluster analysis in a second step to identify variables that are robustly grouped together. We discuss the data, as well as their meta-network groupings, at a baseline network sensitivity level, and in two analyses, one including and the other excluding dyslexic readers. Our results show a stable association between language and cognition, and the isolation of multilingual language experience, musicality and literacy. We highlight the necessity of a more comprehensive view of language and of cognition as multivariate systems.


Introduction
Individual differences in language aptitude.Since its birth and development as a construct, language aptitude has been a fertile field of research that has recently seen a revamped interest in the cognitive neurosciences (Christiner et al., 2022;Feng et al., 2021;Turker et al., 2017Turker et al., , 2019;;Hintz et al., 2023).Indeed, there is evidence for large individual differences in language skill, and this at different levels of language processing.For example, adults differ widely in terms of how well they can hear and imitate subtle foreign speech sounds, and these individual differences have been found to be related to individual differences in brain anatomy in the very regions known to underlie auditory and phonetic perception (Wong et al., 2008;Golestani and Pallier, 2006;Turker et al., 2017Turker et al., , 2019) ) and articulation (Golestani and Pallier, 2006).There are also individual differences in other aspects of linguistic skill, as people differ in their vocabulary size (Richardson et al., 2010), in their ability to infer word meaning from context (Cain et al., 2004), in their semantic capacities (Rabini et al., 2023), in their grammatical skills (Kepinska et al., 2017a;Kepinska et al., 2017b) and in their language comprehension skills more generally (Blott et al., 2023).Interestingly, although individual differences in native language skills are assumed to be at ceiling in healthy individuals, at least for certain domains (pronunciation, basic grammatical patterns), research shows relationships between non-native and native task performance when the latter is tested using sufficiently difficult tasks or with brain imaging techniques, at least at the phonetic level ie (Díaz et al., 2008;Kartushina et al., 2016a, Kartushina et al., 2016b).Cross-linguistic associations have been found in skill learning (see e.g.Berthele and Lambelet, 2017 for an overview of studies on literacy acquisition in two languages), and firstlanguage metrics predict second language abilities longitudinally (Skehan and Ducroquet, 1988).Evidence that adult monolinguals attend to different morphosyntactic cues, and thus have different mental grammars, could be at the source of such differences in second language attainment (Dąbrowska, 2012).
The evolution of language aptitude testing.Traditional language aptitude test batteries such as the Modern Language Aptitude Test (MLAT) aim to test such individual differences using subtests designed to tap into the three or four following dimensions: phonemic coding ability (capacity to work with the phonological inventory of a given language), associative memory (capacity to form associative links in memory), grammatical sensitivity (capacity to identify the functions that words fulfil in sentences and their relationships with one another) and inductive language learning (capacity to create novel, correct sentences by using linguistic rules) (Carroll and Sapon, 1959), using linguistic or language-neutral tasks.While the classic MLAT test only assessed the first three of these dimensions, later developments such as the Pimsleur Language Aptitude Battery (PLAB, Pimsleur et al., 2004) also include tasks operationalizing inductive ability.Due to their similarity, the latter two categories have been combined under the concept of 'language analytic ability' (Skehan, 2002;Wen et al., 2017).Performance on these respective subtests are known to be predictive of second language learning capacities at the phonetic, lexical and grammatical levels, respectively (Skehan, 1991).
General cognition and language aptitude.Despite traditional language aptitude testing being limited to these three (or four) aspects of learning, there is evidence for the role of more domain general, cognitive processes and mechanisms (e.g.working memory) in determining individual differences in language aptitude (Li et al., 2019), although the extent of such influences depends on the age range tested, on tasks that are used, and likely on other factors that have yet to be elucidated (Hintz et al., 2023).Several recent studies on language aptitude have therefore, also assessed domain general cognitive measures of intelligence (i.e.IQ) and working memory, and/or declarative and procedural learning abilities (Feng et al., 2021;Hintz et al., 2023;Turker et al., 2017Turker et al., , 2019)), in addition to assessing language-specific skills,.Other factors known to be important for language learning are the attitude towards it, such as anxiety and motivation (Dörnyei, 2006;Gardner et al., 1997).Work on young learners has yielded robust associations between intrinsic motivation and anxiety with foreign language learning, regardless of the target language (Udry and Berthele, 2021).
Memory and language aptitude.In line with the idea that domaingeneral cognition intervenes in the modulation of specific functions (Kelly and Martin, 1994), models have been proposed outlining the role of general higher-order mechanisms, such as memory (Ullman, 2004) or pattern learning (Goffman and Gerken, 2020) in language.For example, the Declarative-Procedural theory (Ullman, 2016(Ullman, , 2023) ) proposes that language learning, storage and use depend heavily on the declarative and procedural memory systems, and on the neurobiological substrates thereof.Thus, in the present work, we employ tests that tap into language-specific and (syntactic) pattern learning, as well as tests of procedural and declarative memory, in order to shed light onto these posited associations.
Musicality and language aptitude.Beyond domain-general cognition, it is also likely that aptitude and experience in domain-specific but non-linguistic fields such as music or numerical processing interact with (influence, and are influenced by) language learning.For example, music and spoken language are known to share certain features including but not limited to their input via the auditory modality, their sequential unfolding over time but subsequent abstraction to higherlevel representations, rules governing the way that sub-elements can be combined (i.e.syntax), semantics (propositional in language but not in music), written forms (alphabets and musical notation) and the fact that both are higher level forms of cognition and communication unique to humans (Sammler and Elmer, 2020).In line with this, there is a large body of research showing relationships between aspects of language and music processing, and some generalisation from learning in one of these domains to the other (Nayak et al., 2022).For example, although the results are somewhat mixed, there is evidence that music and auditory training help to remediate aspects of reading disorder and of dyslexia (Cancer and Antonietti, 2022;Gordon et al., 2015;Rolka and Silverman, 2015).Further, research has shown that highly musical people (by predisposition) or musicians (by training) benefit from their musical skills in the domain of language, and vice versa (Turker and Reiterer, 2021).Conversely, certain aspects of language experience, and, in particular, experience with tonal languages, has been shown to be associated with better discrimination of musical melodies (Liu et al., 2023).Thus, although there are also many differences between language and music (Haiduk and Fitch, 2022), the parallels between the two domains likely underlie the reported mutual interactions.This, in turn, justifies more careful exploration of musical experience and aptitude in neurocognitive investigations of language aptitude, as has been done in a few recent studies on language learning aptitude in healthy children and teenagers (Turker et al., 2019), in adults (Turker et al., 2017), and in individuals with dyslexia (Christiner et al., 2022).
Numeracy and language.Numerical processing is another higher level cognitive domain involving the manipulation of rule sets, structuring principles, as well as logical and hierarchical operations.Numerical skills and aspects of language (such as reading) share some mental processes, including general ones (e.g.executive function, working memory and attention), but also more specific ones including learning of associations between sounds, their abstract representations and visual symbols, subserved by associative learning and by auditory/ phonological processing.The acquisition of both reading and mathematics involve rapid mental manipulation of these different types of information, and rely on implicit or explicit speech sound manipulation.It is thus maybe not surprising that there are relationships between language and mathematical cognitive development (Wicha et al., 2018), and high rates of comorbidity between reading and arithmetic deficit (Landerl and Moll, 2010), likely explained by shared genetic risk (van Bergen et al., 2023).Further, research suggests that there are genetic contributions to numeracy, with smaller contributions of specific environmental factors (Grasby et al., 2016, p. 9), and that the neural activations for auditorily-presented language comprehension and mathematical operations are largely heritable, correlate with cognitive skills, and activate complementary networks on the inferior frontal and superior temporal cortical surface (Le Guen et al., 2018).Numerical processing is therefore another domain that warrants inclusion in studies of language aptitude, and few studies on language aptitude have assessed arithmetic skills so far (Turker et al., 2019).
Experience and predisposition.Other questions important to investigations of language aptitude are those of causality, and of the relative roles of experience and predisposition.Indeed, there is evidence that lower-level (e.g.acoustic-perceptual, motor) (Vandermosten et al., 2010) and higher-level cognitive (e.g.executive and memory) abilities, as well as non-linguistic skills (e.g.pattern detection abilities, motor skills) modulate individual differences in specific aspects of language learning capacity.Moreover, some are likely themselves modified by language learning and by multilingual language experience (DeLuca et al., 2020;Kormi-Nouri et al., 2003;Voits et al., 2022).This is in line with the fact that, although it was traditionally assumed that language aptitude arises from innate predisposition (Carroll, 1981), modern views consider it to be more dynamic, modulated by age (Wen, 2019), or electivity (Eisenstein, 1980;Grigorenko et al., 2000;Ramoser et al., 2024), though not all studies agree that elective multilinguals have higher aptitude scores (Harley and Hart, 1997;Sawyer, 1992).In the neurosciences, while it is well established that the brain can change both functionally (Stein et al., 2009) and structurally (Hervais-Adelman et al., 2017;Stein et al., 2012) due to (language) learning, there is also research suggesting that some aspects of brain anatomy and language skills may arise from innate predisposition (Golestani et al., 2011).Moreover, it is known that genetics plays a role in the likelihood of developing reading disorder (Doust et al., 2022), itself known to lie on a continuum within the normal reading skill range (Pinel et al., 2012).Thus, nature and nurture not only both play a role but likely interact (van Bergen et al., 2014) in determining language aptitude.
Aims and scopes.This exploratory project aims to uncover some of these relationships: through network psychometrics methods, we explored a broad array of behavioural language aptitude measures as well as non-linguistic, domain-general and domain-specific cognitive measures including memory, intelligence, musical and mathematical abilities.This project expands on similar recent ones in the field by taking a broader view of the potential mechanisms underlying language aptitude and its relationship with cognitive measures.First, as already mentioned, we included a very broad array of linguistic, cognitive and perceptuo-motor skills, thereby bringing measures that have previously been studied separately together in the same study.Second, we aimed to maximise the chance of detecting variability, i.e. individual differences, across our measures by including individuals who differed widely in their language skills (by including healthy readers but also people with history of reading disorder) and experience (by including early and late, elective and non-elective multilinguals).We quantified multilingualism in a continuous (DeLuca et al., 2019) and multifactorial manner by computing multiple measures of language experience entropy (Kepinska et al., 2023) on self-reported measures of speaking, comprehension and multilingual exposure.Regarding the subgroup of individuals with a previous diagnosis of dyslexia, note that (1) many were university students who thus might have compensated for their deficit, and (2) participants with an undiagnosed, underlying dyslexia-continuum disorder may have been included.Our participant sampling approach allowed us to look for associations between, on the one hand, multilingualism and/ or reading disorder, and on the other hand, language aptitude and/or performance on other cognitive or perceptuo-motor domains, if present.Using exploratory network analyses, as have recently been used in second language acquisition research (Freeborn et al., 2022), we aimed to uncover associations and dissociations between different aspects of linguistic and non-linguistic processing, and shed light on language aptitude in the very broad sense of the term.

Participants
Participants were recruited from the Geneva region, neighbouring French-speaking cantons in Switzerland, and nearby France via the distribution of flyers and online advertisements.According to the Organisation for Economic Cooperation and Development (OECD), around 40 % of people aged 25 to 34 in Switzerland have upper secondary or post-secondary non-tertiary education, around 50 % have tertiary education, and around 10 % have below-secondary education (OECD, 2023b(OECD, , 2023a)).Switzerland is also known to be a relatively multilingual country: according to the Federal Population Census Structural Survey, French is considered a primary (dominant) language by 23.4 % of the population, with German, Italian and Romansch accounting for the remaining 77 %, and other non-national, co-existing languages accounting for 22.8 % of the report.Mindful that any study analysing the interaction between complex cognitive domains and language skills is (relatively) context-dependent (Milfont and Klein, 2018), we defined our target participants as healthy, relatively well-educated and multilingual adults with French as their first or most dominant language.Given the extent of the study and the finite time and resources for testing and analysis, we aimed at a sample size of 150.
Eventually, the final sample comprised 152 healthy, adult francophone individuals (102F, mean age 25 years, SD = 6.06 years, range = 18-47 years.Individuals with professional qualifications in music or simultaneous interpreting were excluded.Individuals with developmental dyslexia who met the study's inclusion criteria were included if they had a previous official diagnosis (N = 29).Individuals proficient in more than six languages were accepted if French was not their native language but demonstrated advanced proficiency in French.All eligible participants provided signed informed consent for all subsequent experimental procedures and data reuse within the open science framework.

Data collection
Data were gathered online and in person by a team of six data collectors (three native speakers of French and three upper-intermediate or advanced speakers).All interactions and documentation throughout the process were in French.The study received ethical approval from the Geneva Cantonal Ethical Commission (CCER) under Protocol N. 2021-01004.Data collection occurred through multiple sessions, as follows.
Session 1 involved unsupervised, online survey data collection.Session 2 involved online behavioural data collection, supervised by an experimenter.Session 3 involved additional behavioural data collection, this time in person, for tests that could not be conducted remotely, and Session 4 involved collection of neuroimaging and genetic data (not included in this work).All questionnaires and tasks were presented in French or adapted for use in French, with verification by at least one native French speaker.Survey instructions were presented in written form.Behavioural task instructions and audio stimuli were read aloud by a commercial AI voice generator (female), if not otherwise specified (https://www.naturalreaders.com/commercial.html;Voice "Renee France", speed=-1).

Pilot study
Before data collection, an explicit grammar learning task called ArtGram was developed and pilot-tested.The ArtGram test, developed for adult participants, extended the PLAB4 task used in language aptitude testing for children (Pimsleur, 1966;Pimsleur et al., 2004).The test consisted of a 3-minute block to learn a small base-form lexicon from an artificial language and sample sentences using inflected forms, followed by a self-paced multiple-choice translation task with 12 novel sentences, performed within a time limit of 15 min.The language had a free constituent order, with an inflectional structure denoting the nominative (subject), accusative (direct object), dative (indirect object) and ablative (adverbial of means), singular and plural numbers.The verb was consistently conjugated in the third-person, either singular or plural.To mitigate potential confounding factors related to explicit memory, participants had access to the sample dictionary throughout the task.The pilot study was run to assess four different questions: (1) characteristics and feasibility of the explicit grammar learning task, (2) reliability of the new online platform for behavioural data collection, (3) redundancy between the new task and another task testing implicit grammar learning, (4) assessing possible redundancies between a set of declarative and verbal memory tasks that were planned to be included in the final battery.The tasks were delivered to 20 native French speakers without previous history of language or reading disorders (11F, mean age = 27.65 SD = 8.9) in a videoconference session overseen by an experimenter.Results indicated that the grammar tasks as well as the scores from the memory tasks all showed low correlations (i.e.all correlations below 0.5), suggesting that they measured related but fundamentally different skills.Therefore, none of the tasks were excluded from the assessment battery.Pilot results are shown in the online Supplementary material.

Session 1: questionnaire completion
Participants completed a series of online questionnaires using Qualtrics XM©, and were instructed to proceed without interruption.The Language Experience and Proficiency Questionnaire (LEAP-Q; (Marian et al., 2007), due to its length, was completed last.Very multilingual individuals were allowed breaks during the LEAP-Q, but encouraged to submit their responses as soon as possible.

Session 2: online, supervised behavioural testing
The second session involved supervised behavioural data collection through Zoom©.Participants were instructed through a demonstration video to share both their sound and entire screen, to use wired headphones (this in order to provide reliable online measurements (Woods et al., 2017;Zhao et al., 2022)), and to ensure that their microphone was functioning well.Headphone and microphone tests from the Gorilla open materials section were included as a mandatory prerequisite for beginning the task sequence.Researchers supervised the session without interference unless technical issues arose.Tasks were delivered via the Gorilla web interface (Anwyl-Irvine et al., 2020).The Gorilla system would block the session if participants were on a mobile phone or tablet rather than on a computer, and if they were on a different web browser than Mozilla Firefox© or Google Chrome©.
Participants autonomously navigated through tasks in a predetermined order among 15 possible pseudo-randomisations, one of which was automatically assigned by the system upon initiation of the testing sequence by each participant, and blind to the experimenter.This approach was chosen over complete random assignment to prevent potential spurious effects resulting from certain tasks influencing each other when consecutive.Before each task, participants received onscreen written instructions in French, and were unable to proceed until the natural reader had finished delivering the instructions.Fixedlength breaks (3 or 5 min) were allowed after the most intensive tasks, to optimise compliance.A 10-minute break was inserted between the short and long-term blocks of the DecLearn task (see Tasks).Participants could advance before the end of the break if they felt ready to continue, while a 60-second timer would appear in the last minute of the break if the participant had not proceeded to the next task by then.

Session 3: in-person behavioural testing
In this session, participants were tested in person for tasks requiring ecological delivery, closer supervision, or use of the same hardware across the cohort.Testing took place at the Human Neuroscience Platform of "Campus Biotech" in Geneva, in a testing room where the same laptop, mouse, and headphones with microphone were used.The same pseudo-randomisation strategy as described above was used to organise and deliver tasks through the Gorilla interface, maintaining the microphone check to make sure we could safely record tasks requiring voice responses, for later assessment.An experimenter closely supervised the session through a connected screen, mouse and keyboard while facing the participant, intervening only when required by the task or in the case of a technical failure.In the case of tasks requiring a voice response from the participant, the experimenter manually noted down accuracy and/or measured completion time in the session booklet.To avoid data loss, such tasks were also audio-recorded and later verified by a native French-speaking team member, when necessary for scoring accuracy.Responses for other tasks were measured in Gorilla directly.Questionnaires and behavioural tests are outlined below.Any modifications that were made to the original test protocols are described below, and otherwise test protocols were administered as described in the respective cited literature.

Tasks
Our approach consisted in identifying mostly existing tests to measure the target processes of interest.Below we describe the questionnaires and behavioural tasks used and the cognitive processes they tap into.

Questionnaires
Language Experience and Proficiency Questionnaire -LEAP-Q (Marian et al., 2007): this questionnaire measures multilingual language experience.Its publicly available French adaptation based on the 5 language version adapted by Marilyn Hall (Northwestern University) was used.The questionnaire was extended to accommodate up to 50 languages.We invited participants to include any language or dialect as well as extinct languages, if relevant.We added questions regarding the amount of time spent in contexts such as online communities, fandoms and subcultures in which a particular language is spoken, as well as in the context of social media, learning apps and everyday life (e.g.administration tasks), to better gauge other factors that may contribute to language learning.The item on cultural identification was removed.The LEAP-Q had to be completed for each language that was declared as being part of an individual's background.
Code-switching questionnaire (Rodriguez-Fornells et al., 2012): this questionnaire measures code-switching habits.We selected items contributing to the contextual and involuntary switching scores and integrated them into the LEAP-Q pipeline, right before the languagespecific questions.Participants were asked to respond based on their average code-switching habits, not language by language, as it would have been cumbersome for them to answer, and a potential source of oversampling/overfitting.Our own French adaptation was used.Monolinguals did not take this questionnaire.
Motivational Factors Questionnaire -MFQ (Ryan, 2008;Thompson and Lee, 2018): this questionnaire measures motivation and attitude towards foreign languages.The questionnaire was used by the authors in earlier work on language aptitude (Steiner et al., 2021).With the target population of this project in mind, the following adaptations were made: constructs and items assuming that participants are actively involved in studying foreign languages academically were not included; items that refer specifically to the English language, anglophone world and culture were reworded to fit a more global context where possible, or excluded.Our own French adaptation was used, and a 5-point Likert scale was chosen instead of a 6-point one, to allow for neutral responses.
Adult reading history questionnaire -AHRQ (Lefly and Pennington, 2000): this questionnaire measures adults' relationship with reading throughout their life.It places participants on a spectrum of individual differences in reading, possibly indicating dyslexia (even if undiagnosed), or borderline-dyslexic profiles.We used a French adaptation of the questionnaire.
Music Use and Background Questionnaire -MUSEBAQ (Chin et al., 2018): this questionnaire measures Music training, capacity, preferences and motivations.We used our own French adaptation of this questionnaire but kept it otherwise unchanged.

Behavioural tasks
Instead of using a single battery, we collected tests that, in our view, fulfilled the criteria for suitable and successful delivery to this cohort (aimed at adults, relatively short, adapted to or born for computer-based delivery), and best contributed to the construct itself.We outline below our motivation for inclusion of the tasks in our battery, within each domain of processing respective conceptual construct.

Language aptitude.
The following tasks were included to test for the classical components of language aptitude, i.e. language analytic abilities, rote learning and phonological coding.
ArtGram: this task for explicit grammar learning was developed and piloted by the authors to test for morphosyntactic manipulation skills.Participants were invited to learn the provided dictionary and sample sentences, and then to choose the appropriate translation for 12 new sentences by recognising the use of morphological cues.We have previously described this task in Pilot Study.
Modern Language Aptitude Test V -MLAT (Carroll andSapon, 2002, 1959): this test was used to test rote learning and lexical memory.It required participants to memorise 24 words from an unknown language in 2 min, after which came a practice block where they were required to choose the correct translation for each word in a multiple choice test.
Farsi uvular sound production task (Golestani and Pallier, 2007): this task was included to assess the sound production component of language aptitude by asking participants to repeat a particularly difficult sound in an unknown language.Participants' accuracy was evaluated by two native Farsi speakers (see Data cleaning and Scoring).
Hindi dental-retroflex task (Golestani et al., 2002): this task was included to assess the phonetic component of language aptitude in perception.Participants were trained in hearing two artificiallygenerated sounds (male voice) (the Hindi dental and a retroflex plosives) making up a phonetic contrast that does not exist in French.They subsequently had to categorise the sounds as sound A or B when hearing samples from a seven-step stimulus continuum from dental to retroflex.200 trials were administered to this cohort.
Brocanto (Kepinska et al., 2017a;Opitz and Friederici, 2003): this task was included to assess language analytic abilities with an eye towards implicit pattern recognition and implicit grammar learning, a fundamental component of syntax and a skill involved in the general debate around the neurobiological substrates of language learning (Goffman and Gerken, 2020).Participants were required to recognise grammatical and ungrammatical sentences with different structures and violation types in an artificial language, inductively and without reference to the grammatical system or the components underlying the Brocanto language.Training (reading-only) and testing (judging grammaticality by button press) blocks were provided in three phases.In this experiment we delivered a previously developed version of the test (Kepinska et al., 2017a), with minor modifications related to timing and condition counterbalancing, as follows: a practice screen was added at the beginning of each testing block to test button press and rule comprehension; button press instructions were 1 for "correct sentence" and 0 for "incorrect sentence" on the upper numerical keys; the fixation cross was displayed for 2 s, without jittering, during resting periods; sentence presentation in both testing and training blocks lasted for 6 s; we maintained 40 sentences per testing block, but we included 8-word sentences and obtained a 1/2 ratio of grammatical and ungrammatical sentences overall (N = 120, of which the 60 ungrammatical sentences were new and about 1/3 of the 60 grammatical ones [N = 22] had been presented during training).After this task, we inserted a 3-minute break to allow for recuperation.

Domain-general cognition.
The following tests were included to test more general aspects of cognition, thought to contribute to the spectrum of individual differences in language aptitude (see e.g.Li et al., 2019;Udry and Berthele, 2021 for studies that support the association of general cognitive ability and language aptitude).
Raven's Advanced Abridged Progressive Matrices -APM (Raven, 1998): in this widely used test of non-verbal, fluid intelligence, participants were invited to select the missing block to complete a picture set.The Advanced and Abridged French version with a time limit of 20 min for 23 trials was administered to this cohort.This harder version was chosen to make individual differences at the two ends of the continuum emerge more easily.Our version was programmed for computer-based presentation on the Gorilla platform in place of paper and pencil, but was otherwise unchanged.
Corsi blocks (Arce and McMullen, 2021;Corsi, 1972): we administered this widely used test of visual working memory in a 2-dimensional, computer-based version, differently from the original one.The literature shows reliability between computer-based and in-person versions of this test (Brunetti et al., 2014;Siddi et al., 2020).The blocks were presented as 9 black squares arranged on a white background.Sample block sequences were shown in yellow and selected blocks were shown in green upon clicking.Block layout followed (Kessels et al., 2000).
Digit span (Ryan et al., 2014;Wechsler, 2008): forward and backward versions of this task of verbal working memory were administered through headphones, and responses were collected via the upper numeric keypad.
Attention Networks -ANT-I (Callejas et al., 2005): this test is a classic assessment of attention networks (executive control, alerting, orienting).Participants were invited to detect arrow orientation in presence of flankers and sound cues, with 18 possible conditions (interactions of arrow direction with respect to flankers, presence and position of sound cues) and for a total of 432 trials.After this test, we inserted a 5-minute break to allow for recuperation.
California Verbal Learning Task -CVLT (Deweer et al., 2008): for this task of episodic verbal learning, we used the validated French version, short-term and long-term recall (without the 'cued-recall' conditions).Participants were invited to listen to two lists of items and recall them as best as they could, by repeating them to the experimenter right after the listening phase in the short term recall, and then after 20 min without any cues.
Finger tapping test (Ashendorf et al., 2015;Strauss et al., 2006): to test motor speed, participants were invited to tap with the index finger of the dominant and then non-dominant hand on the spacebar as fast as possible.We ran an abbreviated version of the Finger Tapping (Ashendorf et al., 2015), with 5 trials per hand and a 1-minute break between the third and fourth trial.
Purdue Pegboard Test (Tiffin and Asher, 1948): to test dexterity, participants were invited to insert pegs in holes, first with the dominant, then with the non-dominant hand and finally with both hands.The final test consisted in building an assembly of the small tools used before, while alternating hands to place them one on top of the other.

Memory.
The following tests were added to assess the procedural/declarative memory systems, in conjunction with our tests of implicit syntactic pattern learning and explicit manipulation of morphosyntax.
Serial Reaction Time -SRT (Earle and Ullman, 2021;Lum and Bleses, 2012).This task of procedural memory required participants to reproduce the position of a series of stimuli (smiley faces) appearing in sequence over four designated locations in a diamond-shaped grid by pressing the arrow key corresponding to the position of each stimulus.We presented six blocks in total, with a random-sequence block at the beginning and one at the end.Through implicit learning mechanisms relying on procedural memory, the learning of non-random sequences should make reaction times faster throughout the task.This task is a modification of (Nissen and Bullemer, 1987).
DecLearn (Hedenius et al., 2013;Reifegerste et al., 2021): this test of explicit learning through declarative memory had an encoding phase and a recognition phase.During the encoding phase, participants determined whether an object appearing on-screen was real or not.After 10 min, in the recognition phase, participants were presented with both novel and already seen images, and they were asked if the item had already been presented or not.

Domain-specific cognition.
The battery included tests of mathematics (arithmetic skills) and musicality (rhythmic and tonal skills) as specific domains of cognition that are tangential to language, possibly sharing cognitive resources via specific common mechanisms or via the general executive system.
Revised Tempo Test -RTT (Bellon et al., 2022): this test required participants to solve 60 additions and 60 subtractions in 1 min each.The task was administered to test arithmetic skills.Responses in this computer-based version were collected via the upper numeric keypad.
Advanced Measures of Music Audiation -AMMA (Gordon, 1989): this task, widely employed in the assessment of musicality in the higher education system, was administered to test tonal and rhythmic identification in participants, and was chosen due to its relative difficulty (of note, we remind readers that this cohort was composed entirely of nonmusicians).Participants were invited to judge difference or identity in melody or rhythm between pairs of musical excerpts.After this task, we inserted a 3-minute break to allow for recuperation.
2.7.2.5.Literacy and literacy mediator skill assessment.In this study, we chose to include dyslexic participants as representative of one of the possible groups with relatively lower language aptitude in some domains (e.g.reading, phonology), and given the large cohort, we also expected that some undiagnosed participants would take part.We therefore included assessments of the main skills known to be impaired in dyslexia, i.e. of naming automatisation, reading, spelling, and phonological awareness.Inclusion of these tests was instrumental in allowing us to evaluate how and where these participants lie in the highly intertwined network composing what we believe is the language aptitude spectrum.In order to safeguard ecology in interactions, participant compliance and data quality, these tests were all administered in person.All voice responses were audio-recorded and, where relevant, an experimenter timed test performance using a chronometer or using the spacebar on the second connected keyboard (in the case of the spelling task).A pre-recorded voice from an AI voice reader (see Tasks) delivered the stimuli, unless otherwise specified.
Literacy skills.Text reading: in this test, participants were invited to read aloud, as fast and accurately as they could, two texts of increasing difficulty "Le Pollueur" and "L'Alouette" (Gola-Asmussen et al., 2010;Lefavrais, 1967).Participants who did not manage to finish reading in 3 min were stopped by the experimenter.
Word and pseudoword reading: this test was created by merging two standardised dyslexia assessments in French, the ECLA16+ (Gola-Asmussen et al., 2010) and the EVALEC (Sprenger-Charolles et al., 2005).We included words from the EVALEC to make the overall test difficulty harder than in the ECLA16+.There were overall 56 regular words, 52 irregular words, and 56 pseudowords.For each category (regular, irregular and pseudowords), we took 20 stimuli from the Ecla16+ and the remaining from the EVALEC (36 regular words, 32 irregular words, 36 pseudowords).The stimuli from the EVALEC, originally presented in a mixed fashion, were randomised within each category (i.e.list) in order to allow to time and score performance for each category separately.
Spelling (Gola-Asmussen et al., 2010): in this task, part of the ECLA16+ assessment battery, participants were invited to write down words, pseudowords and irregular words on paper after hearing them.
Mediator skills.Rapid Automatised Naming (RAN) (Frederickson et al., 1997): naming automatisation was tested for the categories of objects, colours and digits.Participants were invited to accurately and rapidly name the stimuli after a practice trial, going in reading direction (left to right) and line by line.Two blocks of trials were administered per item type.
Phoneme suppression: in this task, developed in-house, participants were invited to repeat pseudowords by omitting the first phoneme.Pseudoword list creation through the project "Lexique" (New et al., 2001(New et al., , 2004) is described in (Rutten et al., 2019).
Spoonerisms (Szenkovits and Ramus, 2005): in this test of phonological awareness, participants were invited to listen to word pairs and repeat them aloud by swapping the first phoneme.In this case, word pairs were read aloud by a pre-recorded male voice.

Data cleaning and scoring
Questionnaire responses and behavioural task logfiles were preprocessed and cleaned in Python.Scores were calculated, when relevant, according to the individual protocol of each task or questionnaire.For the purposes of this analysis, from each questionnaire and task, we selected the scores shown in Table 1.
Questionnaire preprocessing.For each questionnaire, we removed potentially identifying or irrelevant metadata.String responses were recoded as numbers, and a tabular file was created with items and scores of all participants.Missing data were addressed a priori by forcing responses within questionnaires, with on-screen warnings for unfilled responses, and by closely supervising the completion of the entire Qualtrics pipeline.This approach ensured the inclusion of all questionnaires for all participants in the dataset, except for monolingual participants who did not take the code-switching questionnaire.Ad-hoc procedures were put in place to derive dimensions from the LEAP-Q and the MFQ, as follows.
LEAP-Q dimensions.The LEAP-Q is a widely used instrument to assess multilingual experience, from contexts of use, to learning, use choices, history with the language, and native-likeness.In this wide exploration of language and cognition data, having a composite measure of multilingualism from the LEAP-Q would have been ideal.Nevertheless, the LEAP-Q authors themselves suggest that it is best to not

Table 1
List of tasks, their delivery, and their labels.combine the items but rather to use the LEAP-Q to obtain a qualitative description of an individual's history with a certain language (Marian et al., 2007).In our case, as this strategy was impossible to implement into the network analysis pipeline (or in any quantitative statistical pipeline), we decided to extract a measure reflecting participants' multilingual experience.Per participant, 4 different continuous 'multilingualism scores' were calculated based on the Language Experience and Proficiency Questionnaire (LEAPQ, Marian et al., 2007).Three of them were based on participants' reported proficiency in the domains of (a) speaking, (b) reading and (c) comprehension of all the reported languages, and one was based on current exposure to all reported languages.Following previous work (Gullifer and Titone, 2020;Kepinska et al., 2023), each domain's multilingualism score combined the proficiency or exposure scores for the different spoken languages using Shannon's entropy equation (Shannon, 1948).The calculations were performed using the entropy R package (Hausser et al., 2012).MFQ factor analysis.To reduce dimensionality within the MFQ but still include all the possible dimensions driving motivation to learn foreign languages, we performed exploratory factor analysis (EFA), to assess dimensionality (Shrestha, 2021).
Two factors were identified: factor 1 revolved around the idealisation of foreign languages, while factor 2 was ascribed to the general confidence of participants with foreign languages and their willingness to actively employ them.Details of this procedure can be found in the Online Supplementary material.
Task preprocessing.For each task, the log file and audio recordings in the Gorilla cloud were retrieved.Each task was then scored according to its own protocol, using Python.As in the case of questionnaires, a priori measures were taken to avoid missing data: namely, the close supervision of both online and in-person sessions and the forced progression through the assigned task pipeline avoided skipped tasks.The recording of voice responses allowed to save potentially lost trials due to live interaction (missed timing or scoring by the experimenter, language incomprehension, etc), allowing the freedom to re-listen and re-score any necessary trials or tasks.Nonetheless, a few cases arose when technical malfunctioning caused missing responses: in the case of the Farsi uvular task, especially, the sometimes unstable internet connection caused loss of a few trials.In this case, the average score was calculated on the number of trials that were successfully rated.Two independent native Farsi speakers then scored the task by judging participants' nativelike-ness in reproducing the target sound.

Reliability
Internal consistency analysis is a fundamental step in behavioural research to ensure that the metrics used are reliable, valid, and capable of providing accurate measurements of the constructs being studied (Brysbaert, 2024).If a behavioural metric is not internally consistent, its ability to accurately reflect the construct it intends to measure is compromised.In this study, given the wide array of metrics used, and even though all but one of them are available and documented in the psychological and language aptitude literature, we computed Cronbach's alpha in our own sample to determine whether the items consistently measured the same underlying construct, which we could then assess as a node in the exploratory graph analysis: we refer to this process as 'reliability', more generally.
For two metrics, item-level consistency could not be computed due to data structure (Spoonerisms and Phoneme suppression were measured at the task level).For others, alternative procedures were more suitable: this is the case for the Motivation questionnaire, for which we report factor analysis; the ANT-I task, for which we report split-half reliability with Spearman-Brown reliability coefficients obtained by validation through 10,000 random splits (Thompson et al., 2010); and the Farsi uvular sound production task, for which two independent raters evaluated the data and inter-rater reliability was computed (Golestani et al., 2007) .Finally, for entropy measures, internal consistency was not expected given that each participant reported their competence in speaking, comprehending or being exposed to different languages, more or less numerous, with varying and not necessarily consistent levels.More details and the code of all reliability analyses as well as the Factor Analysis code for the MFQ can be found in the Online Supplementary Materials (https://osf.io/8ar2x/).

Data modelling strategy
The empirical goal of this cross-sectional study was to identify patterns of linear associations among the many different manifest scores gathered via the test battery and the background questionnaire data described in the previous section.Two main techniques are available to do this: Factor analysis (FA) and network analysis (NA).Factor analysis has been used extensively in language aptitude research (starting from the classic development of the aptitude tests, as in Carroll (1958), to more recent analyses as in Udry et al. ( 2021)).Factor analysis presupposes the existence of latent constructs, of which test results and questionnaire-based information are manifestations.These constructs may be considered completely orthogonal, or still correlated to some extent (if oblique rotations are used).Network analysis, on the other hand, models linear associations (usually in the form of partial correlations, so-called edges) among all measured variables (so-called nodes), and does not necessarily make assumptions about the existence of latent constructs.It is important to note, however, that NA is not categorically incompatible with the idea of latent variables or other types of sets of correlated metrics that represent meaningful psychological categories.As argued in Martin et al. (2019), it is neither necessary nor useful to oppose NA and FA in this respect, what seems more promising is to use ideas from both analytical techniques to account for more strongly correlated clusters of manifest metrics and for pairwise associations, at the same time.
We therefore applied a modelling strategy that does exactly that.We first explored the data using a regularised network analysis, as discussed in Golino (2020).As a part of this analysis, we used a community detection algorithm to explore clusters of variables that were grouped together based on high mutual linear associations.We use the term cluster to refer to the output of this algorithm.We then fitted a confirmatory factor analysis to the structure that emerged from the network analysis, and attempted to optimise it by including pairwise correlations.Finally, as the sensitivity of the regularisation metric can be set to different levels, we ran a check of the robustness of the cluster structure found in the first analysis by fitting a multitude of regularised networks with different sensitivity parameters as well as a meta-network analysis of the clusters produced by them.In a last step, we fitted another network analysis with a clustering algorithm on the set of networks generated in the previous steps.We refer to the groupings of variables yielded by this analysis with the term meta-clusters.

Data pre-processing
We selected 35 different metrics as discussed in the Tasks section and in the Supplementary material.Missing data (see Online Supplementary material for details) were not imputed, and all scores were ztransformed before further analyses.
We report the bivariate Pearson correlations between all metrics for all data (see Supplementary material for more details).Visual inspection of the estimates in Fig. 1 reveals clusters of strongly associated measures, e.g.among the reading-related and the multilingual experience-related entropy scores, quite unsurprisingly.
No other metrics were as strongly associated such as to advise against graph analysis.

Participants' education and language background
Being relatively multilingual and well-educated, our target population was quite representative of the social milieu of Switzerland.
Participants' socioeconomic status is reported in Fig. 2 (Barratt, 2006;Rakesh and Whittle, 2021).Their scores on the visual and verbal working memory tests are representative of reported norms for the young adult-to-adult age range in the WAIS IV-DS Block span score and the forward Corsi Block (Roivainen, 2019;Wisdom et al., 2012).Similarly, their scores are in the 29th percentile of fluid intelligence scores as reported by the APM manual (of note, we remind readers that most of these tests were not developed for computer-based administration and no norms exist for such).Overall, participants tolerated the intensive computer-based session quite well: this study began shortly after the last COVID-19 pandemic restrictions were lifted in Switzerland, and we found that all participants had a good understanding of online interactions and that they managed Sessions 1 and 2 well, possibly partly due to increased experience with online interactions during the pandemic.
Language backgrounds were quite varied, as is typical in the multicultural and multilingual urban Swiss society.Overall, 90.8 % of participants (N = 138) used French as their most dominant language, and for 78.3 % of them (N = 119) French was also their first language in order of acquisition.We provide language background data for the three most dominant languages below (Fig. 2a).As concerns their knowledge of French, participants reported good levels even when French was not their first or second most dominant language.Overall, the sample was quite homogenous regarding their dominant L1 and the general level at which they mastered French; good French knowledge was fundamental for test performance (e.g.understanding task instructions, performance on the linguistic tasks), and comparable levels across participants was important for allowing us to reliably compare performance.Participants for whom French was not the dominant language (N = 14) were all polyglots who heard about the study through language clubs and associations and were recruited provided they had good mastery of French, to populate the upper end of the language experience continuum.Regarding second and third language data, English was dominant, along with heritage languages spoken in families and other national languages of Switzerland that are part of the school curriculum (Italian and German).Given the above information, we believe that our results will reliably reflect the nature of the constructs being investigated.

Reliability
Z-score standardised distributions, basic descriptive statistics, and reliability (internal consistency) data are reported below for the full sample (Table 2).Reliability was acceptable (0.7) to optimal (>.9) in all metrics, except for explicit grammar (likely due to the low number of trials and the novelty of the task, which might warrant further refinement), musicality and musical training (likely due to people with no formal musical training having inconsistent and variable musical experiences, e.g.listening to music but playing no instruments, or displaying erratic practice habits), the Hindi task (like due to its inherent difficulty) and visual working memory (likely due to the online version of the Corsi Blocks task).
As concerns the ANT-I task, reliability measured via half-split Spearman-Brown coefficient was mediocre for the alerting and orienting score, and good for the inhibition score.A similar result has previously been reported (Habekost et al., 2014;MacLeod et al., 2010;Salthouse and Hedden, 2002).Inter-rater reliability for the Farsi task was acceptable, and identical to previous reports (Golestani and Pallier, 2007).
Regarding the composition of the sample, reliability was mostly  and Szenkovits, 2008).We observed two cases where internal consistency of the dyslexia sample was lower than in the typical readers but still acceptable: this is the case of the CVLT Recognition (dyslexics: α = 0.67; typical readers: α = 0.79), possibly due to the presence of verbal working memory problems in dyslexia (Kramer et al., 2000).Similarly, dyslexics were less consistent in the Purdue Pegboard test (α = 0.66) than typical readers (α = 0.85), which could be expected given that dyslexia might be associated with motor disruptions (Marchand-Krynski et al., 2017;Nicolson et al., 1999) -although the nature and causality of these relationships are debated (Decarli et al., 2024;Ramus et al., 2003).Data on distribution and reliability of the split samples can be found in the online Supplementary Materials (https://osf.io/8ar2x/file behpaper-supplementary-distribution_reliability_data_split_samples.pdf,Fig. 2S, Table 1S and 2S).

Regularised network analysis with intermediate sensitivity
We first discuss the output of our psychometric network analysis via exploratory graph analysis (EGA) with regularisation (Epskamp et al., 2018).The goal of the regularisation algorithm is to eliminate spurious correlations.We used the EGAnet package (Golino and Christensen, 2023) with the walktrap algorithm (Golino et al., 2020, p. 295) to detect clusters (our term used for what is interchangeably referred to as communities, groups, dimensions, or latent variables in the literature) in the network.We set the hyperparameter (gamma) of the regularisation function (EBICglasso) to 0.25, which corresponds to an intermediate sensitivity.The setting of this parameter is somewhat arbitrary: Higher settings raise the threshold for correlations to be maintained and are thus more cautious.Such a higher threshold yields an increasing number of nodes that lack any noteworthy association with other nodes in the network and that are therefore not part of any cluster.Lower settings, on the other hand, foster the discovery of relations (Epskamp and Fried, 2018).Starting with an intermediate setting of 0.25 seemed adequate as it has been chosen in similar investigations (see e.g.Hintz et al., 2024, p. 11 for a recent example).
The exploratory graph analysis with the walktrap clustering algorithm yielded five clusters.Six variables were not part of any cluster.
The largest cluster (1) covers tasks that pertain both to the languageoriented aptitude tasks, and to general cognitive and learning-oriented tasks.It also includes motor skill tasks (dexterity, motor speed).The second-largest cluster (3) involves reading tasks (some of which are explicitly designed for dyslexia screening) as well as a phonological suppression task (also used for dyslexia screening).Moreover, a questionnaire item asking for reading difficulties in childhood is associated with this cluster.A cluster of five variables (4) consists of metrics for the individual multilingual language experience repertoire (entropy of speaking, understanding, and current exposure of languages) as well as the two motivational dimensions discussed in section Tasks above.Two additional pairs of variables cluster together, first 'musicality', and second the questionnaire items tapping into code-switching behaviour.
A total of six variables are not associated with any of these dimensions.Moreover, the analysis shows various, sometimes rather strong correlations across cluster boundaries (the stronger the lines, the higher the absolute partial correlation coefficient), in particular between cluster (1) and cluster (3).
Fig. 4 shows the same analysis for the data without the participants with a previous dyslexia diagnosis.This analysis yielded more clusters: what previously was one cluster (1 in Fig. 3) is now grouped into a larger (1) and two smaller (2 and 4) clusters of variables.Some differences are also visible in the reading-related variables: while there is still a cluster with reading tasks (6), not including phonological tasks in this case, the questionnaire item now groups with the two motivational factors in cluster (7).Musicality is again a separate cluster (3), as is code-switching (8).
The analyses show that there is indeed evidence for more strongly correlated clusters of variables that seem to be interpretable and which correspond to tasks and questionnaire items that bear an obvious relationship to each other, more strongly so in the case of the complete data set.
Following the logic of the approach discussed by Martin et al. (2019), we then fitted a confirmatory factor analysis to the data, using the groupings yielded by the NA above as latent constructs.Moreover, we included the strongest cross-cluster correlations (above the threshold of 0.1) in the NA in the model, in the form of additional covariances between manifest variables (see Online Supplementary material for details).This approach is roughly similar to the use of the modindices() function in lavaan, as it increases the model fit by adding parameter estimates that were not part of the initial model.We only include correlations between manifest variables and do not modify the initial associations of metrics with clusters here, as even such more drastic changes to the model structure do not yield a much better model fit (see Online Supplementary material for more details on the output of the modindices() function).Table 3 lists these cross-cluster correlations.It is noteworthy that each of these cross-cluster correlations include one of the variables in the reading-related cluster 3. Fig. 5 shows that some of the factors (ellipses, modelled based on the clusters in the NA) are mutually related: for example, the largest cluster (1) in the NA (now termed Cluster1) and the reading-related cluster (Cluster3) are associated with a moderate estimate (− 0.62).Also, musicality (Cluster2) and Cluster1 are moderately associated (0.41).Interestingly, this analysis also shows that the language experience cluster (Cluster4) is not strongly associated with any of the other clusters.
It must be pointed out that the confirmatory factor analysis is not a good fit to the data.The fit metrics (RMSEA = 0.083, cfi = 0.805, nfi = 0.734) point to a mediocre (root-mean-square error of approximation) to unacceptable fit (comparative and normed fit index).This means that reducing the associations in the data to the five factors that correspond to the clusters in the network analysis, even if additional covariances between manifest variables are modelled, does not represent the data well.

Network analysis 2 (loop through gammas)
As the setting determining the sensitivity of the regularisation algorithm is admittedly arbitrary, drawing wide-ranging conclusions from the structure that emerges from a specific model seems incautious.We therefore decided to run a robustness check of the structure, discussed in the previous section.To this end, we ran an identical EGA analysis with walktrap cluster detection with various gamma hyperparameters.In total, we applied the algorithm 20 times to the whole data set and another 20 times to the data set without dyslexics, across the range from 0.025 to 0.5.Unavoidably, the more strict the setting, the more variables are isolated from one another.In the case of gamma = 0.5, a total of 13 variables (all data) or even 26 (without the dyslexics) are not part of any cluster (see Online Supplementary material for details).
This exploration allowed us to compare the network structure across a wide range of sensitivity settings.The more often variables are grouped together, the more robust is their empirical association and the more likely they are to represent manifest metrics of what corresponds to a latent construct in FA.As an example, in the full data set, paired associate learning, fluid intelligence, episodic verbal learning and verbal learning are grouped together in all EGA analyses.The same applies to other variables, such as the scores for musicality and for musical training experience.Also, many reading-related variables seem robustly clustered, both for the full data set and for the subset without the dyslexics.

Meta network analysis
As a final analytical step, we explored the output produced by the analysis across the gamma parameter range in what we term a "metanetwork analysis".Figs. 6 and 7 show the heatmaps of the total number of times any two given variables are grouped together (normalised to the interval 0:1).
The matrix of shared cluster membership can be displayed as a network with the igraph package (Csárdi et al., 2024).Community detection was done with the "fast and greedy" algorithm (Ognyanova, 2016).Figs. 8 and 9 show the two "meta-networks" for the full and the partial data sets; Tables 4 and 5 are the legends for the two Figures, listing the variables and their respective cluster membership.
The meta-network for the full sample confirms the main findings of the gamma = 0.25 EGA: Musicality appears as a separate cluster, the reading-related variables cluster together, however here they are grouped with specific additional psychometric variables (inhibition, serial.rt.unconsc).There is a large cluster (2) covering language-related and cognitive variables.And finally, language experience is grouped together, now also with the two questionnaire items on code-switching.
The meta-network for the data without the dyslexics, as in earlier

Table 3
Inter-cluster correlations between variables that are > 0.1, all data.The covariance and the standardised estimates stem from the confirmatory factor analysis (see supplementary material and Fig. 5 below for details).analyses, yields a comparatively higher number of clusters.Again, three reading-related variables cluster together.As in earlier analyses, language experience forms a separate group.Musicality, for once, is not separate here, but it is grouped with a phonological discrimination task and with inhibition.As in the full sample analysis, there is not one big cluster with cognitive and language-related constructs, but two (3) and (4).

General cognition and language
The different analyses of our set of variables above show a certain variability with respect to the clustering across the two samples (all data vs. data without the dyslexics) as well as across the different parameter settings for the regularisation algorithm.One aspect nevertheless seems robust across these analyses: there is no evidence for a clear-cut segregation of language-oriented aptitude variables and metrics of more domain-general cognitive abilities.The dedicated tasks that are adapted from traditional language aptitude tests (implicit.grammar,explicit.grammar, paired.asso)systematically team up with metrics for fluid intelligence and working memory.They may not always end up all in the same cluster (although in our paradigmatic example with gamma = 0.25 for all data they do, cf.Fig. 3), but they always cluster together with other, non-language-oriented variables.This finding is confirmed by our meta-clustering analysis (see Figs. 8 and 9): for the whole data set, all language aptitude metrics end up in the same cluster as most other general cognitive metrics (e.g.working memory, intelligence, amongst many others).
Whether this is a surprising finding or not depends on the theoretical assumption regarding the nature of linguistic competence and (second)  language acquisition: If strong modularity is assumed, as is the case in some generativist takes on first and second language acquisition and on the mind in general (Truscott, 2017), the systematic association of language-oriented and general metrics as found in our data begs for an explanation.More generally, if (first or second) language acquisition, at least within the so-called critical period (see Birdsong, 1999 for a critical discussion), is governed by entirely different mechanisms than 'ordinary' learning, then language aptitude and general cognition should be correlated on relatively low levels.We are aware that building up a straw man is not useful, since current-day generativist acquisition research indeed acknowledges the existence of interfaces between the language module(s) and other cognitive modules (Juffs, 2011), which could at least partially account for the intertwined associations of the two 'modules' that emerge from our data sets.
Nevertheless, our data are certainly more in line with a view of (second) language acquisition as a learning process that draws heavily   on domain-general abilities (which by no means categorically excludes the existence of domain-specific abilities for language learning).The emphasis on domain-general learning abilities is in line with other work on language aptitude in adults (Li et al., 2019), and it also ties in with work on young learners: In samples of children learning French and English as a foreign language, there was a similar integrated main factor on which both language aptitude tests and general cognitive metrics loaded (Udry and Berthele, 2021).Our best interpretation of these findings is that language aptitude is to a great extent part of an extended positive manifold (Borg, 2018).

Memory systems
Recent models have linked language to higher-order procedural/ implicit and declarative/explicit memory (Ullman, 2001(Ullman, , 2004(Ullman, , 2016(Ullman, , 2023)), and to a general mechanism of inductive pattern learning based on recognising statistical co-occurrences and transitional probabilities (Chang et al., 2012;Goffman and Gerken, 2020).Moreover, the contribution of mnemonics is crucial for the acquisition, retention and processing of (un)familiar material, as well as for organising it into structures: in this sense, some have argued that working memory (part of executive control) is the ultimate mechanism binding phonetic coding, language analytic skills and rote learning (Miyake and Friedman, 1998).In our task battery, we included tests that tap into all these aspects of memory, but the interpretation of the relationships that have emerged is non-trivial.
In the whole-sample network analysis, at baseline gamma = 0.25, many of the general cognition and language-related variables cluster together (as explained in the General Cognition and language section).Here, the discussion will be focused on the role of memory systems.The first, most general result, across gamma parameters and across clusters, is the clustering of several memory and pattern recognition variables together with implicit and explicit grammar ones, which suggests that the human language capacity is highly compositional, building upon other skills (Yang et al., 2019).
Explicit/declarative memory likely supports the manipulation of morphosyntax, learning new words, and recalling word lists and previously seen pictures, tested in our battery by the following measures, respectively: explicit.grammar,paired.asso,episodic.verbal.learnand decla.memvariables.Our fluid intelligence variable (fluid.int,measured via Raven's matrices) is also present in this cluster, as is implicit.grammar, measured via the Brocanto task: these tasks share a pattern recognition component, one visual and one syntactic, and are heavily inductive.The fact that memory variables cluster together is in line with the declarative-procedural model in the context of native language processing, where explicit/declarative memory is responsible for the learning and storage of lexical information and where implicit/procedural memory is involved in grammar learning (Ullman, 2001).
The declarative-procedural model would have predicted some degree of separation between explicit and implicit memory processes, at least for foreign language learning, which we didn't observe in our data (at the default gamma value and in the whole sample).Moreover, we would have expected implicit/procedural memory skills (the srt.unconsc variable) to be associated with performance on the artificial grammar task (implicit.grammar, measured via the "traditional" Brocanto task).Indeed, serial reaction time tasks have been extensively compared to artificial grammar learning in terms of their implicit/procedural component (Chang et al., 2012), and they have been linked to working memory (Bo et al., 2011).In our results, however, the serial reaction time task (serial.rt.unconsc) was completely uninformative regarding possible procedural/implicit memory links to language learning, given that it mostly remained isolated from any other clusters, except for at gamma 0.025 to gamma 0.1 in the whole sample.These are, however, very low parameter levels, at which it is not advised to draw any wide-ranging conclusions.At the baseline gamma = 0.25, in the typical reader sample, we observed several smaller clusters and more isolated variables: importantly, declarative memory and implicit grammar learning make their own cluster, as do arithmetic and verbal working memory.As a general interpretation, this might be ascribed to typical readers carrying out cognitive tasks in a modular, specialised (and efficient?)fashion.In addition, the clustering of declarative memory and implicit grammar might reflect the way syntactic patterns are processed in the initial phases of learning a new language, at least in typical readers: despite not being cued as to the structure of the artificial language, participants' capacity for explicit memory might have driven their grammaticality judgments more than (or together with) the actual inductive learning of the syntactic hierarchy.This possibility could have been assessed with a retention test (Kepinska et al., 2017a) or collection of reports on learning strategies, which were not implemented in our study.
We will now discuss arithmetic and motor variables with respect to both samples at gamma = 0.25.There is a known working memory component to basic arithmetic processing, and some have argued that this happens specifically through verbal working memory, with many neuroimaging studies investigating the role of the angular gyrus as a hub for this process (Zarnhofer et al., 2012).Our arithmetic and verbal working memory tasks cluster together at the default gamma 0.25 in the whole sample, and even make their own cluster in the typical readers: this might provide more evidence for the verbal grounding of basic arithmetic operations, despite the lack of post hoc strategy reports on the way in which our participants carried out the arithmetic task.Moreover, the fact that these two variables make up their own cluster in the typical readers might, again, attest to the fact that these participants channel their cognitive resources more efficiently.Finally, finger tapping and dexterity have been shown to be related to general intelligence and executive skills (Alhamdan et al., 2023;Kanj et al., 2022;Schear and Sato, 1989), and to language ability (Obeid and Brooks, 2018).Our results in the network analysis and the meta network for both samples are in line with these observations.
As expected, at higher gamma levels, both samples start showing more isolated variables and smaller clusters.In the whole sample, at gamma = 0.35, declarative memory becomes isolated from the bigger cluster.At gamma 0.375, arithmetic and verbal working memory make up their own cluster, similarly to the effect observed in the typical reader sample.Beyond this level, the system starts breaking apart and becoming less interpretable.In the typical reader sample, this happens already at gamma = 0.3, where the general cognition and language cluster loses the working memory and explicit grammar variables.
Meta clusters (reported in the 'Meta Network Analysis' section) confirm the fact that language variables are highly intertwined with memory: in the whole sample, the meta-cluster is very large and includes many more general cognition variables beyond memory, as discussed in previous sections.In the typical readers, the meta-cluster is smaller but still includes language aptitude, arithmetic, declarative and working memory variables, possibly indicating the stability of their (intricate) link.
In sum, the network analysis and meta-cluster exploration confirm the complex relationships between memory systems and language.However, disentangling each variable's individual contribution is not straightforward, and possibly indicative of the fact that representing the language-memory interaction with just one model might be simplistic.

Literacy and mediator skills
In both exploratory graph analyses with and without the dyslexic readers, reading and reading-related skills cluster together, and are dissociated from language and general cognition.There are some noteworthy but unsurprising exceptions, such as the rapid automatised naming metric (fluency.autom.nam) in the first cluster that also shows strong cross-cluster associations with the reading cluster (3), in particular with word reading fluency (reading.t.all): the association with reading metrics is unsurprising as automated naming tasks are known to be robust indicators of reading ability and problems (Bonte and Brem, 2024).Nevertheless, the general pattern emerging from our data suggests a separation between skills supporting spoken language functions versus literacy skills.From an evolutionary perspective, this is not surprising, considering that reading and writing are cultural inventions that are evolutionarily more recent compared to spoken/other language functions.As such, reading and writing likely required repurposing of the cognitive and perceptual skills that were previously used for different functions (Huettig et al., 2018).Also, until recently, most of the population was not able to read and/or write (UNESCO, 2017), and still now the percentage of literate population varies widely between countries (Our World in Data, 2021).Most of the languages among the generally assumed 6000 to 7000 languages spoken in the world are hardly ever or never used for reading and writing.Hence, the dissociation in our data between reading skills and other language abilities might reflect how learning and use of oral versus written language rely on distinct processes having different evolutionary timelines and functions.
Despite the consistent pattern of separation of reading skills from language and cognition in both network analyses, one can observe that the presence or absence of dyslexic readers in the sample modulates the composition of these 'reading clusters'.In our baseline network analysis including the whole sample, one cluster (3) includes all the variables measuring reading skills (words, pseudowords, text) together with spelling abilities and phonological awareness measures (spoonerism and phoneme suppression), with the latter known to be predictive of later reading skills in pre-reading children (Elbro et al., 1998;Thompson et al., 2015).We note that this pattern emerged and remained consistent starting from our baseline gamma level (0.25) and for all the networks with more restrictive (i.e.higher) gamma levels, whereas at lower gamma levels, word and text reading tasks belonged to one cluster, and phonological and spelling tasks were associated with different clusters.A cluster related to reading abilities also emerged in the network analysis without the dyslexic readers.However, the measures that are part of this cluster (6) belong only to text reading and words/pseudowords decoding tasks, whereas other reading-related skills, such as spelling and phonological awareness, are part of other clusters.This pattern of word and text reading tasks clustering together is consistent across all the gamma levels, and starting from the models with a gamma level equal to or above 0.375, they also cluster with rapid automatised naming and/or motor speed abilities.The patterns emerging from the data point to a link between phonological decoding skills and the ability to read fluently when both dyslexic and typical readers are considered, at least when intermediate and strict gamma levels are used.In contrast, when including only typical readers, fluent reading abilitiesfor words or textare not strongly linked to the ability to manipulate phonological information.This pattern is not surprising considering what we know about reading development and dyslexia.Binding orthographic and phonological information is a first key element in reading development (Blomert, 2011).With time and practice, this process of phonological decoding (i.e., associating letters or clusters of letters with their corresponding sound) allows readers to achieve fluent reading of words, without the need for letter-by-letter decoding (Grainger and Ziegler, 2011).This progression from fine-grained grapheme-by-grapheme decoding to the more automatised whole-word reading has been proposed to be impaired in dyslexic readers (Pugh et al., 2000(Pugh et al., , 2001)).This suggests that some dyslexic readers rely more heavily on orthographyto-phonology conversions even after years of reading instructions, and hence that they depend on phonological decoding processes much more than do typical readers.In our work, the clustering of reading-only tasks when typical readers are considered might reflect the independence of automatised reading processes and online phonological manipulation skills in this sample.However, as noted above, inclusion of the dyslexic readers leads to a tighter grouping of word decoding and text reading with phonological abilities, possibly reflecting a stronger reliance of the dyslexic subgroup on phonological decoding processes to perform fluent word and text reading.

Experience, motivation, code-switching
In general, we observed a complex interaction of motivation for language learning, self-reported multilingual experience, and codeswitching habits.Although these variables are not always part of the same cluster, both the network and the meta-network analyses reveal their interplay in both samples, and the interaction other variables such as reading history, attention orienting and phonological skills.
Participants' multilingual experience variables were measured via the entropy calculations on the self-reported LEAP-Q data, as described in the method section, and resulted in the entr.speaking,entr.comprehension and entr.curr.exposscores: these describe participants' speaking skills, comprehension skills, and their current degree of exposure to multiple languages, respectively.We will now collectively refer to these variables as "multilingual experience".Participants' cumulative code-switching habits were measured via the code-switching questionnaire, administered collectively for all languages (as explained in the method section), which yielded the code.switching.context and code.switching.unconsciousvariables, describing contextual and involuntary switching between languages.Motivation for foreign language learning was measured via the MFQ questionnaire and the subsequent factor analysis (see MFQ Factor Analysis): here, we isolated two factors, related to the idealisation of foreign languages and the confidence in using them (variables: motivation1 and motivation2).We will now discuss the interactions between these variables, across samples and analyses.
In the network analysis, at gamma = 0.25 in the whole sample, multilingual experience clusters with both motivation variables.In the typical reader sample, at gamma = 0.25, we observe two distinct clusters: motivation variables cluster with reading history (measured via the read.diffic.childscore in the AHRQ questionnaire), and multilingual experience clusters with the phon.suppresvariable (measured via the phoneme suppression task).The clustering of motivation and multilingual experience in the whole sample, and of motivation with reading history in the typical readers, is stable up to gamma = 0.35.Then, the clusters start breaking apart in both samples and becoming less interpretable.
Code-switching variables make their own cluster in both samples at gamma = 0.25.However, in the whole sample the cluster also includes attentional orienting and motivation variables at gamma = 0.225, just below our baseline level: this possibly indicates that their interplay is somewhat salient and not an accidental observation we could have incurred at lower gamma levels.In typical readers, code-switching remains mostly isolated, but we see that the reading history variable clusters with motivation at gamma = 0.225.
These interactions are coherently represented by the meta-network analysis: in the whole sample, multilingual experience, motivation and code-switching form a meta-cluster with attentional orienting (measured via the ANT-I task), whereas in the typical readers we still observe more and smaller clusters: attentional orienting clusters with code-switching, reading history with motivation, and experience with phonological skills.
We will now discuss these results.As in the context of memory systems, here we observe that the whole sample has fewer and larger clusters both in the network and in the meta-network analyses, whereas the typical readers show more numerous and smaller clusters: this might indicate that there are differences in how typical readers channel their cognitive effort, in a more specialised and possibly efficient way.However, if we look at both samples, despite the differences in the number of clusters and the variables they include, the variables at play are always the same, even when they break up into smaller groups.
Therefore, it is worth discussing the links between code-switching, multilingual experience, attention and motivation in a more general way, based on meta-network observations.In the whole sample, the clustering of multilingual experience variables with attention orienting and motivation suggests that there is an interplay of knowing, being exposed to and being confident with languages, and that these things interact with the way people orient attention (in typical readers, we even observed a meta-cluster including only code-switching and attentional orienting, indicating that the relationship is even more stable).These results are in line with previous evidence: people's multilingual experience is characterised by different acquisition or learning contexts, topics they talk about, code-switching habits and the attentional control they can or should exert as speakers/receivers (Costa et al., 2008;Kheder and Kaan, 2021).This can explain the link between codeswitching and attention, and between foreign language experience and motivation.Moreover, having high literacy favours foreign language learning (Dufva and Voeten, 1999), and this could be reflected in the link between reading history and motivation observed in the typical reader sample's meta-network (in the whole sample reading history clusters with other literacy skills).
It is interesting to note that our variables were measured with independent testing instruments, which we sometimes shaped to fulfil the needs of this study, but nonetheless contributed to a cohesive view.This is the case for the code-switching questionnaire, from which we selected only two indices (Rodriguez-Fornells et al., 2012).It is also the case for the LEAP-Q (Marian et al., 2007), where we independently applied the entropy score calculation to get cumulative indices of multilingual experience (Kepinska et al., 2023).This suggests that deriving dimensions from complex self-reported data is a valuable approach, even though the original questionnaires did not suggest that one could use them to derive composite scores.
As a caveat, self-reported data is subject to a certain variability due to latent constructs that we could not assess here.For example, personality can influence self reports, and has been shown to modulate language learning (Ghapanchi et al., 2011;Rizvanović, 2018;Robinson et al., 1994;Zhang et al., 2013).However, our results are in line with studies showing that learning strategies and motivation (Karlak and Velki, 2015;Uztosun, 2021) are better predictors of language learning than personality.Listening skills and phonological memory, together with motivation and metacognitive awareness, have also been reported to influence foreign language competence (Bourdeaud'hui et al., 2021;Dufva and Voeten, 1999;Vandergrift, 2005).This could explain why phonological suppression, i.e. one component of listening skills/ phonological awareness, is linked to multilingual experience in our typical reader sample.
Overall, we observed complex but coherent interactions of attention skills, code-switching habits, motivation and experience with foreign language experience in both samples.

Musicality and musical training
Musicality is grounded in the ability to understand the structure of music (including harmonic, tonal and temporal hierarchies) (Honing et al., 2015;Stevens, 2012;Trevarthen, 1999), and is a component of musical aptitude (Christiner and Reiterer, 2015;Turker et al., 2019).Musicality has frequently been investigated in conjunction with musical training and language aptitude, as music and language are "auditory phenomena" sharing behavioural and neural features (Milovanov and Tervaniemi, 2011;Nayak et al., 2022;Turker and Reiterer, 2021).
Here, we chose to assess musicality through the widely used AMMA test (E.E. Gordon, 1989).This task measures tonal and rhythmic audiation, a component of musicality encompassing auditory working memory and auditory musical imagery (Grashel, 1991;Platz et al., 2022).A strong positive correlation has been reported between the AMMA tonal and rhythmic scores (Turker et al., 2017), which we also observed (Pearson's r(150) = 0.96 p < 0.0001).We therefore chose to use the total score in our analysis.
We note that the musicality and musical training scores were among the less reliable, from mediocre alpha coefficients (between 0.5 and 0.6) in the full sample, to poor coefficient in the dyslexic readers (0.3 on the musical training questionnaire).This might be due to multiple factors: the difference in sample size between groups, the insufficient number of musical training index items, and lack of internal consistency of the AMMA test in a participant sample selected to include only nonmusicians (i.e. it is that non-musicians, who have had no ear training, are relatively less consistent and reliable in their performance on the AMMA test than trained and/or professional musicians would be).Nonetheless, these skills cluster together consistently, and, while mindful of the low reliability, it is still worth considering that these metrics might represent latent construct that would be more easily interpretable in a different sample.
As concerns their interaction with language and cognition, in the whole sample as well as in the typical reader network analysis, musicality and musical training did not interact with language and cognition variables, and this generalised across gamma levels.As the only exceptions, below gamma = 0.075 in the whole sample and below gamma = 0.125 in the typical readers, musicality and musical training clustered with the phono.discriminationvariable, measured via the Hindi dentalretroflex task.In the typical readers, the inhibition score (measured via the ANT-I task) was also present in the cluster.As previously explained, however, it is not advised to draw definite conclusions about results found at these low gamma levels.We will thus focus the discussion on the most stable, overall result, this being the clustering of musicality and musical training and their lack of interaction with other variables.The fact that musicality correlates with other musical experience variables, such as training time, cumulative musical experience, or the number of instruments played, is in line with previous results (Bowles et al., 2016;Schneider et al., 2023;Turker et al., 2017).Regarding possible links between musicality and language aptitude, although it has been suggested that musicality-related skills other than audiation (such as pitch perception or singing ability) might be related to pronunciation talent, reading skills, and grammar aptitude (Turker and Reiterer, 2021), our results are in line with studies showing only weak relationships between musicality (also as measured by the AMMA test) and pronunciation talent or general cognition (Turker et al., 2017(Turker et al., , 2019)).Our results are also in line with those studies having used dimensionality reduction techniques, in which the AMMA score grouped with musical experience measures but not with other variables (Bowles et al., 2016;Turker et al., 2017).Therefore, while the AMMA test is widely used to test audiation as a component of musicality, other features of musicality might have stronger links with language aptitude.
Given the reviewed literature (Turker and Reiterer, 2021), we would have expected musical training (a component of musical experience) to cluster with productive or receptive phonological skills, measured via the hindi.phono.discscore (Hindi dental-retroflex task) and/or the phono.imit.farsiscore (Farsi uvular pronunciation task).However, musical training did not group with these or other language-related measures.A reason for this might be that our target population was composed entirely of non-musicians, who had, on average, almost no musical training: this might have also contributed to the isolation of musical training and musicality (both with low scores) from phonological skill variables.
In the meta-network of the whole sample, musicality and musical training are part of the same cluster, and are isolated from other variables, coherent with the network analysis.In the typical reader sample, like in the network analysis at low gamma levels, they group with phonological discrimination (measured via the dental-retroflex task) and with the inhibition variable from the ANT-I task, reflecting a weak (but present) interplay between music-related and language aptituderelated skills.Again, this is to be interpreted with caution, but this result partly aligns with previous literature (Turker and Reiterer, 2021).
In sum, our findings of a lack of association between musicality and language aptitude might have to do with the AMMA test itself, which may be limited in its ability to uncover existing links between musicality and language aptitude, and/or with the low "musical diversity" of the sample itself.Exploring performance on audiation tests such as the AMMA with tasks that assess the processing of vertical (harmony) and long-range horizontal structure of music (musical syntax) may be more informative: i.e. musical audiation may be related to other aspects of music processing, but less to language aptitude or skill.Moreover, musical production, engagement and consumption indices deserve more attention in future research focused on musical cognition, ideally in a more musically diverse sample (i.e.including more highly trained or professional musicians).While the musical cognition network has been widely explored in the neuroimaging literature (Beaty et al., 2016;Koelsch et al., 2002;Schön et al., 2010;Schön and Morillon, 2019;Yu et al., 2017), we are not currently aware of any broad behavioural exploratory studies on music similar to this one on language aptitude.

Limitations and methodological advice
EGA is a relatively recent method that has been gaining traction in cognitive psychology for investigating relationships between behavioural variables, providing data driven exploration and visualisation of multivariate data structures.EGA allows us to explore how a great number of different variables are interconnected while providing insights on the dimensional structure of psychological constructs.However, there are several limitations to be taken into consideration.In this context, we provide some advice to readers based on our experience developing and carrying out a large testing endeavour and an EGA pipeline, in the hope to offer food for thought (rather than a strict methodological checklist) for similar, future projects.
Sample size.Importantly, EGA typically requires a relatively large sample size to produce reliable and stable results.Small sample sizes can lead to spurious connections and less reliable community structures.In the general context of measuring individual differences, some suggest sample sizes of around 200 (Brysbaert, 2024).
Data quality.High-quality, well-measured variables are crucial.Noisy or poorly measured data can distort the network structure, leading to incorrect conclusions, and missing data should be avoided.Moreover, given that (lack of) reliability can affect the network, it is fundamental to maintain access to item-level data, to be able to compute internal consistency estimates and possibly remove inconsistent items and/or metrics.Regarding reliability more generally, when several, highly correlated indices can be derived, for example from a questionnaire, it is advisable to perform dimensionality analysis before choosing the metrics to use.
Thresholding and Sparsity.Deciding on the threshold for including edges in the network and the level of sparsity can significantly impact the resulting graph.There is no one-size-fits-all solution, and the choice of any given level of sensitivity is to some extent arbitrary.In this work, we have shown how the data can be explored at an arbitrarily selected 'default' sensitivity level, which we defined as the intermediate gamma = 0.25.Given the exploratory nature of EGA (as reflected in the method's very name), it is nonetheless advisable to explore if and how the network changes at lower and higher sensitivity levels, compared to the chosen default one.
Metric selection.In our results, given the high number of metrics included, one could ask which latent variables (clusters) are worth using in future research.It is difficult to answer this question since we have observed population/sample size-dependent differences.Nonetheless, we think that our more stable latent variables are the language and cognition cluster, the reading cluster, and the multilingual experience and motivation cluster.These are likely to represent three latent variables, each of them useful for investigating different aspects of the network, perhaps in smaller-scale studies (metric-wise) with larger populations (sample-wise).We note that is not trivial to trim clusters to select their best representatives, given the complexity of the graph structure that we observed.However, some general suggestions can be made, based on the ease of administration and tolerance by participants, as well as on the clustering (and meta clustering) in the full sample.
Relationship between language and general cognition.Memory and general cognition tasks (DecLearn and CVLT, Digit Span, Corsi Blocks and Raven's APM), together with finite-state artificial grammar tasks such as Brocanto (where chance has no place) and morphosyntactic manipulation (ArtGram) seemingly entertain strong links within-and across-clusters, and might be the most informative.Of note, Brocanto a very intensive task (lasting around 40 min), but was nonetheless quite reliable, while the simple and short Artgram, which we developed ourselves, showed mediocre internal consistency.The latter can, however, be a promising stepping stone for the development of novel explicit grammar tasks for measuring aptitude in adults, which are currently missing from the testing panorama.
Literacy and mediators.The strong literacy cluster is separated from the language-cognition one, but densely connected with it.The reading measures (i.e.words and pseudowords decoding and text reading) consistently cluster together in both samples at all gamma levels, whereas mediator tasks (i.e.phonological awareness and rapid automatized naming) present a more heterogeneous pattern of clustering.In light of these results, considering the heterogeneity of profiles in dyslexic readers and in order to fully capture them, the inclusion of reading measures as well as naming and phonological assessment is advised.
Multilingual language experience.In the cluster encompassing entropy of comprehension and speaking abilities, exposure to multiple languages and motivation, we suggest averaging the highly correlated speaking and comprehension metrics or excluding one of them depending on the specific research question.We believe that measures related to idealisation and disposition towards foreign languages (like our MFQ factors) should also be considered in multilingualism/language learning studies.
Less informative tasks.As a final comment, we were expecting some tasks to be highly informative, but contrary to our expectations, their clustering (or lack thereof) posed a challenge to interpretation.This was the case for the Hindi dental-retroflex learning and Farsi uvular production tasks, which we expected to be good proxies of the phonological component of language aptitude in perception and production.It was also the case for the serial reaction time (SRT) task, which we expected to be a good proxy for procedural learning and possibly associated with syntax manipulation, as proposed in the Discussion: Memory systems section and in (Ullman, 2001).

Conclusions
The quest for underlying commonalities (also referred to with the terms of latent variables, constructs, dimensions) is a classic endeavour in cognitive psychology and psycholinguistics.The classic analytical technique to do this is factor analysis.Factor analysis has been criticised for various reasons, ranging from the misleading interpretations such as the essentialisation of the latent constructs, to the lack of possibility for more complex structure within sets of variables, including associations across latent constructs.In our analysis, we opted for an exploratory approach that does not preclude the existence of underlying commonalities, while being maximally flexible with respect to other linear associations between any pair of metrics within our set of variables.In order to account for the arbitrary parameterisation of the regularisation procedure in the network analysis (as implemented in the EGAnet package, cf.Golino and Christensen, 2023), we propose a new, multistage procedure that systematically loops through the parameter settings in a first step, in order to then carry out a meta cluster analysis in a second step to identify variables that are robustly grouped together.We believe this analytical approach has the advantage of being exploratory, while avoiding (over)interpretation of clusters of variables based on a specific variant of the network graph.
Our results show a very complex interplay of variables, beyond language aptitude (phonology, grammar, lexicon) and general cognition (memory, attention, intelligence).We observe how general cognition, linguistic and non-linguistic experience, the idealisation of (and confidence with) languages, literacy history, skill and aptitudes in other cognitive domains (dexterity, arithmetic, music) play roles in this complex system.These roles are not always easy to interpret, but our results highlight the necessity of a more comprehensive view of language and of cognition as multivariate systems.This is coherent with the existence of a positive manifold as a general organising principle of cognition, i.e. within but also beyond the domain of intelligence where the idea was initially developed (Borg, 2018;Van Der Maas et al., 2006).In this view, the high compositionality (Yang et al., 2019) in which cognitive tasks organise and interact in our study, sometimes in isolation and sometimes building upon one another, invites more ample explorations, and specifically ones that go beyond the idea that one task measures one dimension, and that language aptitude is a purely linguistic or a purely cognitive dimension.
Multimodal investigations (i.e.leveraging different data modalities such as brain or genetic data, also obtained in our sample) can be essential to explore this complex interplay comprehensively.Neural data could clarify and better characterise some of the behavioural patterns observed.For example, certain behavioural dissociations, such as the separate clustering of literacy versus other language skills, could be mirrored at the neural level, confirming how some clusters of skills that dissociate behaviourally do indeed rely on non-overlapping brain areas/ functions.Also, it could inform us on the underlying mechanisms behind the differences observed between our two samples.In a study from our research group (Balboni et al., in prep), we explore the relationship between the same behavioural measurements present in this study and the whole-brain activation during speech perception in the participants' native language, which is known to be modulated by multilingualism (Jouravlev et al., 2021).Future investigations will also examine relationships between brain function and functional connectivity during task performance and during resting state, brain structure and structural connectivity, and behavioural measures obtained in and outside the scanner, to better elucidate the neuro-cognitive differences underlying individual differences in language and in cognition (e.g.Kepinska et al., 2017a).

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.publication
comparable between typical and dyslexic readers.The few inconsistencies we observed likely arise from a number of factors: a) the considerable difference in sample size (much smaller for the dyslexic readers, N = 29) and the smaller number of items composing certain indices (such as code switching and musical training), b) the difference in text reading consistency between the two groups (dyslexic readers: α = 0.99, typical readers: α = 0.62), and c) poorer internal consistency of the dyslexia sample compared to the typical readers on the Hindi dentalretroflex contrast task (dyslexic readers: α = 0.45; typical readers: α = 0.68), likely due to phonological processing or access deficits(Ramus

Fig. 2 .
Fig. 2. Panel a) language backgrounds as measured by the LEAP-Q for the first three languages, in order of dominance (colours represent percentages in descending order).Panel b) participants' mastery of French when used as the first (L1) to sixth (L6) dominant language.Panel c) Participants' socioeconomic status (N = 152).

Fig. 4 .
Fig. 4. Network analysis of the data without the dyslexic participants with an intermediate sensitivity (gamma = 0.25).

Fig. 5 .
Fig. 5. Graph of the confirmatory factor analysis fitting the cluster structure from the NA and additional correlations.The linear associations are shown as standardised estimates.

Fig. 6 .
Fig. 6.Cluster affinities for all variables across the 20 ega analyses with different gamma parameters; all data.

Fig. 7 .
Fig. 7. Cluster affinities for all variables across the 20 ega analyses with different gamma parameters; no dyslexics.

Fig. 8 .
Fig. 8. Variables that systematically cluster together across different gamma parameters (all data).

Table 2
Distribution characteristics and reliability of the z-scored data (suitable metrics).Other reliability measures or unavailable reliability data are indicated where relevant.

Table 4
Variables and cluster membership in the meta-network, all data.

Table 5
Variables and cluster membership in the meta-network, no dyslexics.