Exploring the effects of target-language extramural activities on students ’ written production

Frequent engagement in English extramural activities (i


Introduction
It is well established that language exposure is crucial for learning a second/foreign language (L2) (e.g., Ellis et al., 2016;Tyler et al., 2018).For students learning English as an L2, formal instruction in the classroom remains important; however, increasingly, students are also exposed to English outside the classroom, through activities such as gaming and watching YouTube videos.Such exposure toand use of -English is referred to as Extramural English (EE, Sundqvist, 2009).Focusing solely on what happens in the classroom, which most studies on L2 acquisition and use do, thus offers a very limited picture of L2 English development in a country such as Sweden, where the time learners spend on EE activities largely exceeds their English lesson time (Olsson & Sylvén, 2015;Sundqvist, 2009).In response to this, there is a growing interest in studies on the possible impact of these informal and self-initiated language activities on various aspects of L2 English proficiency.To date, numerous studies conducted in Sweden (e.g., Olsson & Sylvén, 2015;Sundqvist, 2009Sundqvist, , 2019)), as well as similar contexts like Denmark (e.g., Hannibal Jensen, 2017) and Flanders, Belgium (Peters, 2018;Peters et al., 2019), have demonstrated a positive impact of engagement in EE activities on students' receptive English skills.These studies have specifically highlighted the positive effects on students' vocabulary knowledge, as well as their listening and reading comprehension abilities.
However, as a field, our knowledge of the relation between EE activities and students' L2 production remains rudimentary (though see Olsson & Sylvén, 2015;Sundqvist, 2019;Sundqvist & Wikström, 2015).The possible effect of EE on writing is particularly under-researched, which is problematic given that writing has emerged in several studies as the skill that Swedish students struggle with the most in English, thus calling for more research and new methods for L2 writing instruction (e.g., Sehlström et al., 2022;Sundqvist et al., 2019).What is more, whereas assessment of students' vocabulary features prominently in studies on extramural activities, grammatical and broader lexical aspects have received very limited focus.As both grammatical and lexical complexity have been shown to be strongly correlated with writing quality (Casal & Lee, 2019;Kyle & Crossley, 2016), examining the relationship between extramural activities and linguistic complexity would help us better understand the role that such activities play for students' language development.
In the present study, we aim to study the effects of EE activities on students' written production.To do so, we use data from a recently-compiled corpus, the Swedish Learner English Corpus (SLEC; Kaatari et al., forthcoming), which comprises argumentative texts written by Swedish junior and senior high school students.Most studies in the field of learner corpus research have focused on learners at the advanced levels, using corpora such as the International Corpus of Learner English (ICLE; Granger et al., 2020) and the Varieties of English for Specific Purposes dAtabase (VESPA; Paquot et al., 2022).What sets SLEC apart from these corpora is that it includes English texts written by learners at intermediate levels, thus responding to the call made by Paquot and Plonsky (2017, p. 87) to expand learner demographics in learner corpus research.In addition, SLEC makes it possible to study the relationship between EE and writing development, as it includes information on how many hours per week students (i) engage in conversations in English, (ii) communicate in English while playing computer/video games, (iii) read in English, (iv) spend time on social media with English content, (v) and watch TV shows or movies in English.The present study thus examines the effects of these five EE activities on both lexical and grammatical features in student writing.Specifically, we focus on examining the effects of EE on lexical diversity and noun phrase (NP) complexity.The following research questions are investigated using measured variable path analysis from the structural equation modeling framework (Larsson et al., 2021(Larsson et al., , 2022): • What is the relative effect of EE activities vis-à-vis classroom factors when it comes to lexical diversity and/or NP complexity?
• To what extent are there differences between receptive EE activities and other types of EE activities in terms of the effect of lexical diversity and NP complexity, and what are the differences?

Extramural English and language learning
As an emerging field of research, out-of-class, informal language learning through EE activities has gained momentum over the last decade (Schwarz, 2020).One main research strand in this field is concerned with documenting the amount and types of EE activities learners of different ages and from different contexts are engaged with, typically drawing on data gathered from questionnaires, interviews, and learner diaries (see Lee, 2022, for an overview of the main instruments used in the area).An additional strand attempts to understand the relationship between learners' EE activities and different areas of language learning by combining data of exposure with learners' proficiency data.Certain areas of students' language development have been found to be positively influenced by EE activities, whereas for others, the picture is somewhat more mixed or, in some cases, incomplete.
However, when it comes to studies exploring the relationship between EE practices and learners' vocabulary knowledge, we are presented with a more complex picture.A generally positive relation between vocabulary and exposure to EE was observed in many studies (e.g., Bollansée et al., 2020;De Wilde & Eyckmans, 2017;De Wilde et al., 2020, 2021;Hannibal Jensen, 2017;Lee, 2019;Peters, 2018;Peters et al., 2019;Puimège & Peters, 2019;Sundqvist, 2009Sundqvist, , 2019;;Sundqvist & Wikström, 2015;Sylvén & Sundqvist, 2012).However, other studies noted differences across different kinds of EE activities and measures, sometimes displaying conflicting results.For instance, Peters (2018) showed a positive relation between learners' vocabulary knowledge and their exposure to non-subtitled TV and movies, the Internet, and written print, but no correlation with playing computer games (see also Muñoz et al., 2018).In addition, Schwarz (2020) detected a positive relationship between EE and learners' receptive, but not productive, vocabulary size.In Bollansée et al. (2020), playing games and watching TV in English (without subtitles) were positively correlated with scores on a productive vocabulary test, whereas the opposite was found regarding watching TV with L1 subtitles.
H. Kaatari et al.One explanation for these mixed results may have to do with which aspects of vocabulary knowledge that were assessed in these studies.In Muñoz et al. (2018) and Peters (2018), vocabulary knowledge was measured through a vocabulary test targeting meaning recognition, whereas Sylvén and Sundqvist (2012) and Sundqvist and Wikström (2015) focused primarily on productive vocabulary.Another explanation may lie in the learner group in question.A positive relationship between EE frequency and productive vocabulary knowledge was found among Swedish learners of English (e.g., Sundqvist, 2019;Sundqvist & Wikström, 2015;Sylvén & Sundqvist, 2012), but not among Korean learners (Lee, 2019).In addition, the types of EE activities studied (e.g., watching TV and movies with or without subtitles in L1 or L2, playing computer games of various types) and the interrelationship between EE and other learner variables such as age, gender, and proficiency results in a multifaceted picture, thus making it difficult to obtain a clear-cut answer as to the influence of EE practices on learners' vocabulary development.
While we have a broad range of studies looking at vocabulary in this context, our knowledge of the effect of EE on students' grammar and writing skills remains limited.Muñoz et al. (2018) is one of the few studies that explores the association between EE and learners' receptive English grammar skills, measured by a test consisting of 80 multiple-choice items targeting learners' receptive knowledge of 20 English grammar features (e.g., negation, relative clauses, singular/plural inflection).The results suggest that EE (particularly with audiovisual material) is positively correlated with young learners' (7-9 years old) receptive grammar skills.However, this view does not seem to be shared by teachers.In a study of teachers' perception of the effects of EE on their students' language learning, the authors found a weak or even negative relation between EE and grammar skills (Schurz & Sundqvist, 2022).In terms of writing proficiency, Olsson (2012) found that frequent EE activities may have an impact on writing proficiency in English (e. g., sentence length, use of infrequent vocabulary), and Sundqvist (2019) found that frequent EE use led to more advanced vocabulary in free writing essays.However, as the former study was based on a fairly small corpus of 74 learner texts of two types (letters and newspaper articles) and the latter on a small subsample (N = 16), the generalizability of the results is somewhat limited.
Against this background, there are still many unanswered questions regarding the relationship between EE and aspects of L2 development including, but not limited to, knowledge of grammar and writing proficiency (Sylvén & Sundqvist, 2012).With some notable exceptions, the learners' data in previous studies came almost exclusively from language tests of various sorts, often with multiple-choice questions, which may not be able to identify the possible effects of various types of EE activities (see Fulcher, 2015).To add to our understanding of the potential relation between EE and language development, looking more into student production, most notably their writing, seems like a natural next step.As Olsson (2012) points out, one reason why learners' written production has been less investigated so far is due to limited availability of large learner corpora that not only includes texts, but also rich metadata on learners' extramural exposure to English.Our new learner corpus, SLEC, was born precisely out of this need, and enables us to address the underexplored areas of EE and L2 development.

Linguistic complexity and language learning
As an important measure of L2 development and proficiency, linguistic complexity has been studied extensively in recent years (Pallotti, 2015).Due to its multifaceted nature, many different definitions and operationalizations have been offered in applied linguistics (see, e.g., Bulté & Housen, 2014;Pallotti, 2015).Linguistic complexity is often considered as being comprised of lexical and grammatical complexity; this study considers two grammatical complexity measures and one lexical complexity measure.
Lexical complexity is commonly considered to comprise subcomponents such as diversity, density, and sophistication (Michel, 2017).However, as our study is merely an initial attempt at looking at lexical complexity in relation to EE activities, we limit the analysis to lexical diversity.McCarthy and Jarvis (2007) define lexical diversity as "the range and variety of vocabulary deployed in a text by either a speaker or a writer" (p.459).Lexical diversity has been shown to increase with proficiency in that as the target language proficiency increases, learners employ a wider range of lexemes (Bulté & Housen, 2014;Crossley et al., 2014;Nation & Webb, 2011).
Grammatical complexity is defined as "the addition of structural elements to 'simple' phrases and clauses" (Biber et al., 2020: 5).Most recent studies in the field have moved away from measures that conflate the syntactic and structural characteristics of linguistic features, thereby making it possible to disentangle the specific linguistic features that contribute to grammatical complexity (see discussions in, e.g., Biber et al., 2020Biber et al., , 2023;;Larsson & Kaatari, 2020).Instead, most recent studies recognize the importance of distinguishing between phrasal and clausal complexity.Phrasal complexity, in particular NP complexity, has been found to be associated with more advanced and register-appropriate writing (Kyle & Crossley, 2018;Larsson & Kaatari, 2020;Taguchi et al., 2013).For example, Casal and Lee (2019) and Lan et al. (2019) found that higher-proficiency students use more attributive adjectives; Lan et al. (2019) also found lower frequencies for prepositional phrases in high-proficiency writing.

Corpus
The data used come from the Swedish Learner English Corpus (SLEC; Kaatari et al., forthcoming).In this study, we used a H. Kaatari et al. subsample of 200 texts from SLEC from senior high school students which have been manually cleaned: all spelling and orthographic mistakes/inconsistencies in the texts have been corrected (but any grammatical errors are left in).The cleaning of texts improves the accuracy of type and token counts, which in turn enables more accurate lexical diversity scores, and it also improves the automatic identification of grammatical complexity features (see Hồng Châu & Bulté, 2023).An overview of the data used in the present study is provided in Table 1. 5s mentioned in Section 1, a distinguishing feature of SLEC is that it includes self-reported information on the time students spend on five EE activities: CONVERSATION, GAMING, READING, SOCIAL MEDIA and WATCHING.These activities can be broadly classified into 'receptive' and 'other', with READING and WATCHING covering the receptive dimension and CONVERSATION, GAMING and SOCIAL MEDIA including both receptive and productive elements.READING and WATCHING encompass the number of hours per week dedicated to reading books, magazines, and newspapers, as well as watching movies and TV series in English, respectively.CONVERSATION, GAMING and SOCIAL MEDIA provide information about the number of hours per week that students dedicate to conversing in English, playing computer/video games that involve English communication either through speech or writing, and using apps and websites with English content, respectively.The distribution of the self-reported time allocation across these EE categories is given in Fig. 1 (see Kaatari et al., forthcoming).
In addition to these EE activities, we also make use of three classroom variables in the analysis: PROGRAM, COURSE, and GRADED.The distribution of number of texts across these three variables is included in Table 2.
Swedish senior high school students have the option to select between two program types: academic and vocational.Academic programs consist of courses that equip students for future university studies.In contrast, vocational programs are specifically tailored to train students for various occupations, including mechanics, electricians, and chefs.Rather than including school year as a variable, we have opted for COURSE instead.Typically, students in their tenth school year study English 5, and students in their eleventh school year study English 6; however, some students in vocational programs might study English 5 across two school years.It should also be noted that English 5 is obligatory for all programs (academic and vocational) whereas English 6 is obligatory only for some programs.Finally, we also take into account whether or not the texts have been graded, meaning whether or not the students were informed that their texts would be graded by their teacher, which may have an impact on students' motivation to take the task seriously.

Operationalizing lexical diversity and NP complexity
Lexical diversity is a measure of the diversity of lexical items produced in a particular text.Lexical diversity has been used as an important measure in language assessment and has been shown to predict writing quality (e.g., Crossley et al., 2014).Type-token ratio (TTR) is a commonly employed and straightforward measure to assess lexical diversity.It involves dividing the number of distinct words (types) by the total number of words (tokens).However, it is widely acknowledged that TTR has a drawback concerning its sensitivity to text length.Longer texts tend to yield lower TTR values compared to shorter texts (Kyle et al., 2021).There are many measures available that try to correct for the sensitivity of TTR to text length.In the present study, we make use of moving average type-token ratio (MATTR; Covington & McFall, 2010).Unlike TTR, MATTR calculates TTR on several segments in a text and averages them.MATTR has been shown to produce stable results on both shorter texts, and on texts with different lengths, as it takes text length out of the equation (Zenker & Kyle, 2021).In order to calculate MATTR, we used the Tool for Automatic Analysis of Lexical Diversity (TAALED 1.4.1;Kyle et al., 2021).More specifically, we used 'mattr_50_cw' which takes segments of 50 words and only includes content words in the analysis.
Regarding NP complexity, we focus on two features that have been identified as key for distinguishing speech from writing (Biber et al., 2020;Larsson & Kaatari, 2020): attributive adjectives and prepositional phrases.Specifically, we look at the normed frequencies (per 1,000 words per text) of the number of attributive adjectives (adjectival modifiers) and the number of prepositional phrases functioning as postmodifiers (prepositional modifiers) in a noun phrase, as illustrated in (1) and ( 2) (examples from Kyle & Crossley, 2018: 341).
(1) The man with the [black] amod coat gave that [small] amod dog some food.
(2) The man [with the black coat] prep gave that small dog some food.
All adjectival and prepositional modifiers were automatically identified and calculated using the Tool for Automatic Analysis of Syntactic Sophistication and Complexity (TAASSC 1.3.8;Kyle, 2016).

Structural equation modelling
To be able to answer our research questions, we employed measured variable path analysis from the structural equation modeling framework (see Larsson et al., 2021Larsson et al., , 2022)).Structural equation modeling (SEM) is a versatile family of statistical techniques that allows for inclusion of multiple independent and dependent variables in a single exploratory system (Hancock & Schoonen, 2015, Kline, 2016).While techniques from this framework scale up to advanced models that can accommodate so-called latent, or unmeasured, variables (see, e.g., Larsson et al., 2022), we will here model measured variables in a measured variable path analysis.This technique enables us to test our theories of hypothesized effects of our independent variables on our dependent variables.
Measured variable path analysis models are confirmatory, meaning that they are used to test hypotheses that are based on theory and previous findings.In practice, this tends to entail running several competing models (representing competing hypotheses) to assess which one best fits the data.Model fit is assessed using a range of model-fit indices, with the most commonly-reported being Chi-square (χ 2 ), Comparative Fit Index (CFI), Root Mean Square Error of Approximation (RMSEA) and Standard Root Mean Square Residual (SRMR); Akaike Information Criterion (AIC) is also used to compare models in terms of their relative fit.We reject or retain a model (either as a whole or parts of it) if it does not have acceptable fit; the common ranges for acceptable fit are summarized in Table 3 (see, e.g., Kline, 2016, for a critical discussion of these ranges, and Larsson et al., 2021: Table 3, for a simplified overview).
The framework enables us to test specific hypotheses (our hypothesized model), rather than the very general null hypothesis (Larsson et al., forthcoming).We fitted three competing models, testing three competing hypotheses.All models included in this paper were fitted using the lavaan package (Rosseel, 2012) in R (4.3.0;R Core Team, 2023).The code used for the present study follows the same structure as the code used for fitting the models in Larsson et al. (2021).
Hypothesis 1. EE activities have an effect on lexical diversity and NP complexity; classroom factors do not.The first hypothesis (Model 1) is formulated based on studies that have found an effect of EE activities on features of L2 language production (e.g., Olsson, 2012;Olsson & Sylvén, 2015;Prophète et al., 2022), but that do not take classroom variables into consideration.The second hypothesis (Model 2) tests the hypothesis that classroom variables are what matters when it comes to L2 language production.Any study of L2 linguistic complexity development in a classroom setting that does not take EE into consideration would indirectly make this assumption.Our third hypothesis (Model 3) is a logical extension of studies that have looked at EE and classroom activities separately and found that these variables are important for predicting L2 language production.
In all three models, we allow all the EE activities to covary, and we also allow adjectival modification and prepositional modification to covary, based on findings from previous studies that suggest that phrasal features commonly do (e.g., Biber et al., 2023).Due to sampling issues, PROGRAM + GRADED and COURSE + GRADED, respectively, are also allowed to covary in all our models.6Future studies may wish to build on the results from our analysis to be able to fit more specific models testing more refined hypotheses.The hypotheses are summarized graphically in path diagrams in Figs.2-4.In all three models, our five EE activities and the three classroom variables are the independent variables, and the three measures of linguistic complexity are the dependent variables.

Results
In Section 4.1, we compare the fit of our three competing models to answer our first research question.In Section 4.2, we zoom in on the best-fitting model to answer our second research question.

The effect of EE activities and classroom factors
To answer our first research question about the relative effect of EE activities vis-à-vis classroom factors when it comes to lexical diversity and/or NP complexity, we look at the model fit for our three competing models.Table 4 summarizes the fit indices for the models.
If we cross reference these results and the recommended ranges from Table 3, we can see that none of the models has terrible fit, but there is one model that stands out: The best-fitting model according to all three fit indices reported is Model 3.This model also has the lowest AIC.In addition, this model not only has the best fit of the three, it also has an acceptable to good fit overall, with the CFI being above 0.95 and SRMR under 0.08.The RMSEA is just over the recommended 0.06 level, which can perhaps be expected given that it is common for models with a low number of variables to have a slightly higher RMSEA.Based on the relative and absolute fit, we retain Model 3; the two other models are rejected on the grounds that they have worse fit and do not reach the thresholds for any of the fit indices.Based on these results, we can conclude that the explanatory system outlined through Model 3 best fits our data and, thus, that both EE activities and classroom activities are important for L2 linguistic complexity development.
However, the overall model fit provides a relatively coarse-grained picture of the effect of EE activities in that it primarily answers the question of whether this is an acceptable exploratory system.It does not in and of itself provide information about the relative effect of individual variables; we turn to this next.

Differences among different kinds of EE and classroom activities
We will now present and discuss the output from the retained model to be able to see what the effect was of individual variables and thus answer our second research question: to what extent the effect of receptive EE activities is different from other types of EE activities in terms of the effect of lexical diversity and NP complexity.As Model 3 was deemed to have acceptable fit, we can look at the path coefficients for each variable.The standardized results are shown in Fig. 5; only those that reached statistical significance at the .05level are shown.
Starting with the direct effects of our classroom and EE activities on our three measures of linguistic complexity, we can see that only four paths were statistically significant.All four were positive, meaning that the model, for example, predicts a 0.226 standard deviation increase in ADJECTIVAL MODIFICATION for every additional hour a student spends READING per week.Similarly, CONVERSATION and WATCHING (movies in English) has a positive effect on LEXICAL DIVERSITY (0.161 and 0.154, respectively).The effect of COURSE on LEXICAL     DIVERSITY is also positive.As COURSE is a categorical variable (with the less advanced course being the baseline), the model predicts that students in the more advanced course have a mean score of LEXICAL DIVERSITY that is 0.224 standard deviations higher than the students in the less advanced course.
With regard to correlations among the independent variables, all the correlations among the EE activities were significant, except for the one between CONVERSATION and WATCHING (r = 0.09).The remaining correlations ranged between r = 0.21 for GAMING and READING to r = 0.57 for SOCIAL MEDIA and WATCHING.As noted in Section 2.3, the correlations between the classroom variables were expected based on the sample.Finally, there is a positive correlation between two of our dependent variables -ADJECTIVAL MODIFICATION and PREPOSITIONAL MODIFICATION -as predicted, based on previous research.
In order to illustrate what texts with high vs. low frequencies for our complexity features look like, we will now turn to some text excerpts.Excerpts (3) and ( 4) are taken from a text with a high frequency of attributive adjectives (ADJECTIVAL MODIFICATION; italicized) and prepositional postmodifiers in noun phrases (PREPOSITIONAL MODIFICATION; underlined).Texts exhibiting high and low LEXICAL DIVERSITY can be found in ( 5) and ( 6), respectively.
(3) Therefore, poverty and a good life are not diametrically opposed as many people assume, a positive state of mind while living the simplest terms could be an important factor promoting your willing in life where the material splendor and prosperity are not a part of it.Therefore, being able to purchase materialistic objects could be enough to bring you satisfaction and success.
[G_2_S_M_22_54] (4) Happiness is the ultimate goal everyone tries to achieve; however, the definition of a happy life seems to differentiate between folks where everyone has an own concept of living a "good life".In other words, the manner of living has various standards for different individuals where the human desires are infinite.[G_2_S_M_22_54] (5) If you want to have a good life, you must have money because if you do not have money, you cannot have a home and you cannot buy food.Money is also a part of the fact that drug abuse can occur.I think you should get a chance to have a job so you can earn your own money because I think it also leads to less drug abuse in this world.[G_1_Y_M_21_10] (6) Living a good life might be one of the hardest things to achieve.There are so many factors, internal and external that can push your life in an unlimited amount of directions.This leads to life being quite unpredictable which stresses a lot of people out.
From people my age I hear a lot about wanting to improve as a person, study, eat healthy and becoming what the internet has named "that girl".[G_1_S_F_21_128.txt] With regard to our second research question, we can note very subtle differences between the receptive EE activities and the other EE activities.The only receptive EE activity that had a significant effect (READING) impacted grammatical complexity only, whereas CONVERSATION and WATCHING had an impact only on LEXICAL DIVERSITY.We may therefore draw the tentative conclusion that the type of input may have an effect on students' writing, such that written input is more likely to affect students' grammatical complexity, whereas spoken (or mixed) input may have an effect on their lexical complexity.However, we did not see a relation between READING and PREPOSITIONAL MODIFICATION.In what follows, we discuss three excerpts drawn from the corpus (Excerpts (7), (8), and ( 9)) as illustration.
(7) It is proven that having a hobby you are truly passionate about and can always look forward to has a lot of positive factors.
Regardless of what type of hobby it is, it can be very motivating to push your limits and achieve things you never knew were possible.As if that wasn't enough, hobbies do also naturally reduce stress, negative thoughts and make time go by faster.Instead of developing bad habits and falling deeper into the well-known sleep, eat, work and repeat cycle you can spice things up and make your everyday life more interesting.Meet new people, improve patience, avoid constant boredom, discover yourhidden talents and even provide yourself with additional income.Everything is possible, you just have to try.[G_1_Y_F_21_61] (8) Therefore, poverty and "agood life" are not diametrically opposed as many people assume, a positive state of mind while living the simplest terms could be an important factor […] where the material splendor and prosperity are not a part of it.Therefore, being able to purchase the materialistic desired objects could be enough to bring you satisfaction and success.Generally speaking, agood life is when you feel complete and satisfied, many individuals conclude that being financially and economically stable could make them physically and emotionally happy.On the contrary, affording everything material wise, acquaintances and hierarchy but could not buy you nutrition but only a temporary happiness.Knowledge, friendship and love could be considered an eternal feeling for many individuals where surrounding themselves with positive energy help them face obstacles in life with an optimistic look where they find their own definition of a""good life"".In addition, choosing a path for your career dictate the way your life will go, whether it is a career with ahuge financial gain, or your aspiring desire could find themselves living agood life in their eyes, as long as their life revolve around the career of their liking.(G_2_S_M_22_54) (9) Social life it is important for me, my family is important for me they do every day to special days.Health is important, I ride and have a horse and she are important for me.When I have a bad day she make it good, I can ride in the woods, cuddle with her and be in the stable when I need to be self.(G_2_S_F_22_4) Excerpts ( 7) and ( 8), produced by students that spend different amounts of time watching and reading in English, can help illustrate the effects of these two types of EE activities on lexical diversity and NP complexity.Excerpt (7) was taken from a text with the second highest LEXICAL DIVERSITY (0.8878) in the corpus, but the frequency of grammatical complexity features is not as high.The writer reported a high frequency of WATCHING (together with CONVERSATION and SOCIAL MEDIA: more than 20 h per week for each).Despite a high score for LEXICAL DIVERSITY, there are very few instances of PREPOSITIONAL MODIFICATION (0.0755) in the text as a wholeand none in this excerpt.When it comes to ADJECTIVAL MODIFICATION, although there is a relatively high frequency of occurrence (0.1792), as we can see from the except, most of the noun phrases with ADJECTIVAL MODIFICATION follow a simple pattern with one single adjective preceding the head noun as in additional income, bad habits, constant boredom, negative thoughts, and new people.While beyond the scope of the present study, it seems that it would be fruitful to look more closely at LEXICAL DIVERSITY specifically in relation to the attributive adjectives used.
By contrast, in Excerpt ( 8), which was produced by a student with comparatively less frequent exposure to English through WATCHING, CONVERSATION and SOCIAL MEDIA than the other students (4, 5, 4 h per week, respectively) but with a comparatively high number of hours spent READING than the other students (5 h per week), we see a greater amount and variety of both ADJECTIVAL MODIFICATION and PREPOSITIONAL MODIFICATION, despite a somewhat lower LEXICAL DIVERSITY (0.8823).Indeed, the text has the highest score for ADJECTIVAL MODIFICATION (0.3617) and a fairly high score for PREPOSITIONAL MODIFICATION (0.1702).As we can see in the excerpt, with regard to ADJECTIVAL MODIFICATION, the student employed various types of adjectives including superlative adjectives (as in the simplest terms) as well as present-and past-participle adjectives (the materialistic desired objects, your aspiring desire).Some nominals are more complex with an accompanying post-modifier (their own definition of a good life), and in some cases, the prepositional postmodifier has an embedded complex noun phrase with multiple adjectival premodifiers (a career with a huge financial gain).Excerpt (9) was produced by a student with little EE exposure, with a total of 2 h per week spent on EE activities (all of that time was spent GAMING).The text has the lowest score in both ADJECTIVAL MODIFICATION and PREPOSITIONAL MODIFICATION (0.0256 and 0.0) in the whole corpus, together with a fairly low score for LEXICAL DIVERSITY (0.5476).Without further analysis, it is difficult to establish a causal relationship between EE activities and linguistic complexity, but as our results point to a positive relationship between at least some of the EE activities and the measures considered, this student could perhaps have benefitted from more extramural exposure to English.

Discussion and conclusion
This study looked at the effects of five EE activities on three measures of lexical and grammatical features in student writing: lexical diversity and NP complexity.We used a newly compiled corpus (SLEC), with argumentative texts produced by Swedish learners at intermediate levels, which enabled us to contribute one piece to the puzzle of L2 learners' linguistic complexity.Using measured variable path analysis, we fit three models testing competing hypotheses about the relative importance of classroom factors vs. EE activities on our measures of linguistic complexity.We retained the hypothesis stating that both classroom factors and EE activities have an impact on these features at this stage of writing development.Specifically, we saw that the level of difficulty of the course the students are taking has an impact on lexical diversity.For the EE activities, we did not notice a clear divide between the receptive activities on the one hand and the other activities on the other hand.Reading had a positive effect on grammatical complexity, while conversation and watching had a positive impact on lexical diversity.
It is well recognized that the field of L2 writing research is as much about the writers themselves (including their previous experiences and their learning contexts) as it is about the writing that they produce (e.g., Hyland, 2019).The current study expanded our understanding of the relationship between the two by tapping into the learners' out-of-school learning context.The fact that we found a positive effect of extramural activities on students' written production suggests that classroom researchers may wish to take this variable into consideration.That is, it seems that more research needs to be done on the impact of extramural activities on language learning and in-class activities.This seems particularly pertinent in countries such as Sweden where the use of English is widespread in various domains.It is also important to keep in mind that the results also showed that classroom factors play a role.Future studies may want to look further into the interplay between (a wider range of) classroom variables and EE exposure to promote learning.Another direction for future research may include complementing self-reported, observational data with other kinds of data (e.g., with interviews and/or tests) and include other classroom variables.
Furthermore, the research on the importance of language exposure (e.g., Ellis et al., 2016;Tyler et al., 2018) may have us predict that the more exposure to EE activities the better.However, it is not inconceivable that there is an upper limit, beyond which there is little time left for schoolwork, which may instead have a negative impact on students' L2 development.Future studies may thus want to look into the degree to which there is an "ideal" number of hours spent on any of the activities.
On a similar note, the results of our study also made it clear that it does not seem to be as simple as 'the more extramural activities of any kind the better'.For example, our results suggest that certain kinds of activities were more effective for lexical complexity than for grammatical complexity.More research is needed to further explore the complex nature of the impact of different types of extramural English.Although our study merely scratches the surface of this area of research, we hope to have contributed to the ongoing discussion of the role of EE in L2 use and development (see, e.g., Olsson & Sylvén, 2015;Sundqvist, 2019).
Nonetheless, based on our results, it is clear that students can benefit from EE exposure when it comes to the frequency of use of the lexical and grammatical complexity features under investigation.These results have pedagogical implications.In the Swedish context, the learners' engagement with EE has presented both opportunities and challenges for English learning and teaching.One particular challenge that has been highlighted in the literature is that it can sometimes be difficult to bridge the gap between what is taught in the classroom and what is used outside (see, e.g., Sundqvist & Olin-Scheller, 2013).In an attempt to address this gap, Thorne and Reinhardt (2008) proposed a pedagogical model called Bridging Activities, designed to combine students' voluntary EE activities with teacher guidance.Such a model can be made more effective by research results like ours.In terms of writing, which is the most problematic area for Swedish learners in general, teacher guidance could start from selecting appropriate materials and activities that their students find most engaging and relevant when working on different aspects of language development (e.g., reading activities to target grammatical complexity, conversation and TV/movie watching to target lexical diversity).
We recognize that there are limitations that merit discussion.It should be acknowledged that the EE activities included are broad and cover many different types of uses within each category.Also, given that the study is limited to corpus data, we have not been able to interview the students regarding their perspectives on the perceived usefulness of different types of activities.In addition, the current study is limited to a small selection of complexity features which do not capture the full range of proficiency in academic writing.
All in all, our results concur with previous findings emphasizing that exposure to an L2, be it in the classroom or outside it, remains important for students' L2 written production (e.g., Olsson & Sylvén, 2015;Sundqvist, 2009).We view this study as an essential first step toward a better understanding of the effects of extramural language exposure on L2 written production, and we hope to have inspired further research on this fascinating topic.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 1 .
Fig. 1.Hours spent per week on the five EE categories.

Table 1
Overview of the data.

Table 2
Classroom variables.EE activities and classroom factors all have an effect on lexical diversity and NP complexity.

Table 3
Common ranges for each fit index.

Table 4
Model comparison.