Exploring the relationship between extramural English, self-efficacy, gender, and learning outcomes: A mixed-methods study in a Norwegian upper-secondary school ☆

This article reports on a mixed-methods study regarding the extent to which the extramural English (EE), external attributions, self-efficacy (concerning EFL reading, speaking, writing, and listening skills), and gender of 42 students, learning English as a foreign language (EFL) in a Norwegian upper-secondary school, predicted their EFL learning outcomes. Data on participants ’ EE (receptive and productive), external attributions, and self-efficacy were collected through a questionnaire and language diaries, while their learning outcomes were measured through a language proficiency test, mock exam, in-depth project, and receptive and productive vocabulary tests. The data revealed several interesting findings, including participants ’ receptive EE statistically significantly and negatively predicting their productive vocabulary test scores, whereas their writing self-efficacy and attributions to specifically literature, TV, and film statistically significantly and positively predicted them. Moreover, neither receptive nor productive EE was found to mediate the relationship between self-efficacy, external attributions, and learning outcomes.


Introduction
Students in Norway need to acquire English language skills that will allow them to effectively interact with others nationally and globally in diverse contexts.The English curriculum for schools, published by the Norwegian Directorate for Education and Training (UDIR), states that English "is an important subject when it comes to cultural understanding, communication, all-round education, and identity development" and that it "shall help the pupils to develop an intercultural understanding of different ways of living, ways of thinking and communication patterns" (UDIR, 2022).English is also the only foreign language that all students are required to learn, starting in the second grade.Alongside their Scandinavian counterparts from Sweden and Denmark, students from Norway possess relatively higher levels of proficiency in English than those in most other European countries (EF, 2022), something which researchers (e.g., Bonnet, 2004;Busby, 2021;Olsson, 2012) have attributed to the time students in Scandinavia spend exposed to English outside the classroom.At the same time, this exposure to English outside the classroom, referred to as 'extramural English' (EE) by Sundqvist (2009), has received comparatively little attention when compared "to the many thousands of published studies on classroom language learning" (Benson, 2011, p. 8).Indeed, even after a flurry of activity in the years following Benson's (2011) observation (e.g., Brevik, 2019;Peters, 2018;Sundqvist, 2019;Sundqvist & Wikström, 2015;Warnby, 2022), EE remains a relatively new field of research, having been mostly explored in connection with English vocabulary knowledge (e.g., Sundqvist & Wikström, 2015), motivation, and English reading proficiency (e.g., Brevik, 2016;Leona, et al., 2021), with few studies (e. g., Schurz et., 2022) venturing beyond these variables.
Given that a growing body of research recognizes an increase in EE amongst students, for instance, through social media use (Berns, 2007;Norwegian Media Authority, 2020), it becomes necessary to study how and, more importantly, what types of EE (e.g., gaming, watching TV and film, etc.) predict learning outcomes (e.g., writing, reading, vocabulary knowledge, etc.) in learners, with these types preferably explored together in studies rather than in isolation.The value of such research lies in its ability to furnish us with data that would help us to better understand how learners are exposed to English in their lives outside the classroom and how this exposure affects their scholastic achievement, leading to the development of didactic approaches more in sync with EE.The resulting synergies between EE and classroom instruction thus achieved would contribute to making students' language experiences in formal learning contexts more meaningful (Henry et al., 2018).Of particular interest to OECD countries, including Norway, would be the development of EE-inspired didactic approaches that close the gender gap vis-à-vis scholastic achievement (for a meta-analysis, see Voyer & Voyer, 2014).Although lower levels of scholastic achievement amongst male students in OECD countries, when compared to female students, is not a recent phenomenon, it has become increasingly pronounced (Borgonovi et al., 2018).For example, PISA 2018, like its 2009 iteration, revealed that the gender achievement gap in reading in Norway was higher than the average gap across all OECD countries (see Borgonovi et al., 2018;Frønes et al., 2020).Data collected on students' EE and how it might predict their learning outcomes could thus provide us with insights to use in developing approaches aimed at improving outcomes for all students, especially males, who appear to be falling behind.
This article reports on a twenty-one-month mixed-methods study, conducted at an upper-secondary school in Norway, that examined the extent to which students' EE, gender, self-efficacy (related to their EFL reading, speaking, listening, and writing skills), and external attributions (to diverse activities, overall EE, and school experiences for contributing to their overall proficiency in English) predicted their learning outcomes, as measured via an English language proficiency test administered by the school, a mock exam, an in-depth project, and receptive and productive vocabulary tests.The study also investigated whether EE mediated the relationship between participants' selfefficacy, external attributions, gender, and learning outcomes.The decision to include self-efficacy and external attributions as variables was because of their purportedly positive links to EE and learning outcomes based on the social cognitive model of learning (Bandura, 1986;1997; see also Henry, 2014), which this study used as its theoretic framework.The study's originality lies in its exploration of the relationship between different EE types, learning outcomes, as measured through multiple forms of formal assessment, and self-efficacy related to reading, speaking, listening, and writing, as well as gender, and external attributions, including the mediating effect that EE has on the relationship between learning outcomes and the other variables mentioned above.The study also placed a stronger emphasis on gathering quantitative data when compared to other EE studies conducted in Norway, where qualitative approaches have sometimes been given greater emphasis than quantitative ones (e.g., Brevik, 2016;2019).This article also contains didactics-specific recommendations for educational institutions and teachers based on the findings presented herein and offers suggestions for EE research going forward.

Extramural English: contours and gender
In the past decade or so, researchers have proposed several terms and concepts to describe learning that occurs outside the conventional classroom, such as 'out-of-class', 'out-of-school', 'outside school', 'extracurricular', 'after school', and 'extramural', sometimes using these interchangeably and in the context of L2 English learning (e.g., Brevik & Holm, 2022;Lai et al., 2015;Peters, 2018;Sundqvist, 2009).The formality of learning has also been used by a few researchers to differentiate between in-and out-of-school learning, with formal learning typically associated with in-school learning and informal with out-of-school learning.Eshach, (2007), for example, presents a useful table outlining formal, non-formal, and informal learning when discussing school visits to a science museum and suggests that the degree of formality is not easily defined with respect to out-of-school settings.Sandlin et al. (2010), in discussing Giroux's public pedagogy, observe that many public spaces, including museums, are effective learning spaces and they implore teachers to think further than the traditional classroom.Several other researchers have similarly encouraged thinking away from the conventional classroom and the 'in-and-out of school' dichotomy, introducing concepts such as affinity spaces (Gee, 2004) and digital wilds (Sauro & Zourou, 2019).They ask us to "look beyond contexts directly embedded within or linked to formal and highly familiar educational institutions and practices" (Sauro & Zourou, 2019, p. 1) and suggest that we should "begin thinking of space as a physical and virtual meld; begin dealing with spaces and groups as squishy and not well--bounded…" (Gee, 2017, p. 28).
To introduce some order and consistency within the field, Benson (2011) introduced the term 'language learning beyond the classroom' to act as an umbrella term.Within this, he identified four major dimensions of language learning from the literature: location, formality, pedagogy, and locus of control, as well as two key analytical constructssetting and mode of practice.According to Benson (2011, p. 9), the terms "'after-school', 'extracurricular' and 'extramural' usually refer to additional programs in school that are less formal than regular lessons and possibly organized by the students themselves".This differs slightly from Sundqvist's (2009, p. 25) definition of extramural English, which we use as our operational definition in this study, and which is outlined in her doctoral thesis as situations where "the learner comes in contact with or is involved in English outside the walls of the English classroom".Discussions surrounding the appropriateness and contours of the different terms mentioned above notwithstanding, there is a growing sense among researchers and the teaching community about the need to develop a more detailed understanding of learning beyond the conventional classroom, including what it involves and how it relates to dynamics within the classroom.Brevik and Holm (2022, p. 10) suggest that "drawing on students' language profiles allows integration of affinity spaces into L2 teaching in a way that connects with students' language learning outside school".In previous studies, Brevik (2016,2019) identified three distinct profiles: gamers, surfers, and social media users.Students with a typical gamer profile have been the focus of several studies in Norway (see the references above), being first identified as 'outliers' because of their higher scores in L2 (English) reading proficiency in comparison to their mother tongue (Norwegian), subsequently attributed to extensive gaming.
Studies have also found a positive correlation between playing video games and motivation (e.g., Brevik, 2016;Brevik & Hellekjaer, 2018), as well as vocabulary knowledge (primarily receptive vocabulary), although it should be noted that productive vocabulary tests have been used in some studies (e.g.Sundqvist, 2019;Sundqvist & Wikström, 2015).Aside from gaming, several researchers have argued that out-of-class extensive reading (e.g., Nation, 2015) and TV viewing (Kuppens, 2010;Webb, 2015) are effective means by which to acquire incidental vocabulary (see also Webb, 2020).Studies looking at the relationships between EE and vocabulary, as already mentioned, make up the bulk of research into EE and English language acquisition (e.g., Bollansée et al., 2020;Jensen, 2017;Peters, 2018;Sundqvist, 2009;Warnby, 2022).Indeed, the importance of vocabulary has long been recognized as a key component of L2 English acquisition (e.g., Coxhead, 2000;Nation, 2015) since a significant correlation has been found between general vocabulary knowledge and L2 English achievement, as well as between academic English lexis and academic achievement (Skjelde & Coxhead, 2020).However, as Peters (2018, p. 143) in his study of Flemish students' 'out-of-class exposure' to L2 English points out, little is known about "the relationship between different types of exposure and vocabulary knowledge".Puimège and Peters (2019, p. 3) also conclude that "research has not yet addressed the question of which words are more likely to be picked up from EE", and Schmitt (2019), amongst others, has called for more research looking at how EE can best facilitate vocabulary acquisition.Beyond vocabulary acquisition, however, it also becomes apparent that researchers should start examining A.J. Rød and R. Calafato interactions between multiple EE types and learning outcomes more comprehensively (and other variables that have been found to predict learning outcomes) to better understand how learning takes place across contexts.
As for gender and EE, several studies have found that boys tend to play more video games and have heavier exposure to English media than girls (e.g., Brevik, 2016;Jensen, 2017;Olsson & Sylvén, 2015;Sundqvist & Wikström, 2015), even if girls outperform boys in scholastic achievement, especially concerning language learning (Voyer & Voyer, 2014).As already stated, few studies have explored the relationship between different types of EE and how these might relate to learning outcomes (beyond just vocabulary), including from a gender perspective, though the fact that boys tend to engage more heavily in gaming and with English media overall, yet still lag behind girls in language learning outcomes might mean that gaming and English media exposure (i.e., certain EE types) may not be the best predictors of learning outcomes.At the same time, other EE types, for example, reading for pleasure (e.g., literature), on which boys spend less time than girls outside school, could have stronger links with learning outcomes.Sundqvist (2009), in her study of the relationship between EE and oral proficiency and vocabulary size among 80 secondary school students in Sweden, found a weak, positive correlation between EE and vocabulary knowledge, as well as a moderate, positive correlation between oral proficiency and EE (albeit only for boys).She suggested that these correlations might have to do with the type of EE that boys typically engage in versus girls, which tend to be oriented more toward productive rather than receptive language skills.Similarly, in a more recent study, through differentiating between the scale of social interaction whilst gaming, Sundqvist (2019) claims that gamers (especially of massively multiplayer games) have a more advanced productive vocabulary than non-gamers, as shown in both vocabulary tests and essays.Interesting as this may be, more corroborative evidence is needed to determine whether productive language skills-oriented EE positively correlates with learning outcomes to a greater extent than receptive language skills-oriented EE does.As such, Sundqvist's findings appear to suggest that researchers can obtain deeper insights into the relationship between EE and learning outcomes by categorizing EE based on the type of language skills targeted, as well as a more nuanced categorization of the EE activity itself.The traditional classification of language skills into receptive and productive should be developed to reflect the evolving concept of language learning beyond the classroom, as discussed in this section.

Self-efficacy, attributions, and extramural English: a theoretical framework
Under a social cognitive model of learning, which emphasizes how people learn and interact with their environment (e.g., through observation and imitation), and how cognitive processes such as attention, perception, and memory shape their beliefs, motivation, and behavior (Bandura, 1986;1997), self-efficacy and attributions are seen as playing important roles in individuals' regulation of their behavior in formal learning environments and outside the classroom (see Fig. 1) (Henry, 2014).Discussing the potential synergy between EE and classroom practices, Henry et al. (2018, p. 267) note that it can enable "students to experience flexibility, immediacy, and autonomy, and generates opportunities for them to speak and to act as themselves", lead to "positive emotion that, in turn, develops and sustains motivation to engage with learning tasks", and "increase experiences of self-efficacy… and self-authenticity."Based on the model, one can, likewise, hypothesize that students with high self-efficacy may be more likely to seek out and take advantage of opportunities to practice English outside the classroom, such as through gaming or social media, thereby further enhancing their language learning outcomes, including in formal contexts.Due to their self-efficacy, they might also initiate and persist in EE as they believe in their ability to succeed, creating, in this way, a reinforcing cycle of higher self-efficacy, increased EE, and better performance.
Self-efficacy can be defined as beliefs that individuals harbor about their ability to perform specific tasks (Bandura, 1997).Concerning language learning, self-efficacy may be categorized according to language skill (Solheim, 2011;Sun & Wang, 2020), for instance, the extent to which students believe they can write an academic essay in English or read contemporary literary fiction.Research indicates a significant correlation between self-efficacy and learning outcomes (e.g., Chao et al., 2019;Graham, 2011;Mills et al., 2006), though few EE studies (we were only able to locate one where self-efficacy was an explicitly examined variable: Sundqvist, 2009;2011) have included self-efficacy as a variable; and those that have, report inconsistent results for any correlations observed between the two.In the study by Sundqvist (2009Sundqvist ( , 2011)), for instance, self-efficacy statistically significantly (medium strength) correlated with EE for boys but not girls, even if the type of self-efficacy explored in the study focused exclusively on learning English in general (i.e., the items were worded in broad terms, with no mention of language skills or tasks to accomplish).Had there been greater specificity regarding the participants' self-efficacy (e.g., what skill-specific tasks they felt they could accomplish with English), one might have obtained a different set of correlation results.Another predictor of learning outcomes, and one that is related to self-efficacy (Graham, 2011), is attributions.Attributions are individuals' beliefs regarding what causes outcomes (Weiner, 1979).With respect to language learning, attributions might take the form of a student crediting their native speaker friends or multiplayer games such as MMORPGs2 for advancing their language skills in the target language.They can be internal, meaning that "the behavior is caused by a characteristic of the individual" (Brady & Woolfson, 2008, p. 529), for instance, learners attributing their learning outcomes to their intelligence, or external, meaning that behavior is based on external circumstances like the environment (e.g., EE).Attributions can also be stable or unstable over time, and controllable or uncontrollable (Schunk, 1994).
Attributions have been shown to correlate with motivation and outcomes in learners, though not always (Cochran et al., 2010;Nakamura, 2018), and their interactions with EE, as well as the effect of these interactions on learning outcomes, have been little studied, and then often with a focus on teachers' attributions regarding their students' EE practices (e.g., Schurz et., 2022;Schurz & Sundqvist, 2022;Sundqvist, 2019).Much like self-efficacy, obtaining a better understanding of students' attributions is useful because these can influence the extent to which they engage in certain activities when it comes to learning (Henry, 2014).For example, school students who attribute their success in learning English to playing video games are likely to put more effort into the activity because they feel they will get a strong return on their investment.In doing so, they would also learn English more rapidly, which could then lead to better performance in certain language assessments at school.Meanwhile, a student who feels that school does not contribute to their English ability may not participate actively in class and expend little energy on tasks because they do not view these as beneficial (and this might hurt their performance in assessments).Gender, too, might have a role to play.Henry (2014, p. 112) believes that "gender roles and gender stereotyping might be highly implicated in the formation of beliefs about learning English", suggesting that boys attribute their English proficiency to out-of-school learning to a greater degree than girls, which can have implications for self-directed learning, motivation, and, ultimately, achievement.These effects notwithstanding, attributions may not always accurately predict learning outcomes, including when EE is involved, though studies on the interactions between attributions, EE, and learning outcomes appear to be absent from the literature at present.As Henry (2014, p. 104) notes, "there may exist a dissonance between the actual effects on proficiency of out-of-school encounters with English, and what students attribute to such experiences", so that, "as with self-efficacy, research is needed on the nature and effects of the attributions students make in relation to the English language skills they develop".

Research questions
Given the limited number of studies on EE overall, especially those that have examined its relationship with self-efficacy, attributions, gender, and, more crucially, learning outcomes, this study investigated the following questions as part of its research focus: 1. Are there any gender differences in participants' EE, self-efficacy, external attributions, and learning outcomes?2. How do participants' self-efficacy, EE, gender, and external attributions predict their learning outcomes?3. Does participants' EE mediate the relationship between their selfefficacy, external attributions, gender, and learning outcomes?

Participants and data collection
Guidelines for this research project were provided by Norway's National Research Ethics Committees (NESH, 2021) and ethical approval was obtained from the Norwegian Data Protection Services (NSD).The participants, their guardians, and the school's principal (the study was conducted at one school) were informed of the study and presented with consent forms.It was made clear that participation was voluntary, and although guardian consent was not a requirement given the participants' age, they (i.e., the guardians) were nonetheless encouraged to co-sign the forms.Forty-two upper-secondary school students (19 males and 23 females), from two consecutive first-year groups taking the technical general studies course (TAF), 3 participated in the study.TAF is a hybrid course that contains elements of general and vocational studies.Students obtain a Specialized University College and Admissions Certificate and a Certificate of Completion of Apprenticeship within a four-year study course.TAF students follow the General Studies English Curriculum during their first year alongside General Studies students.
Proficiency test and summative assessment grades on completion are similar amongst the General Studies and TAF classes at the school, with grades roughly mirroring both the county and national averages for these subjects (UDIR, 2023). 4The TAF course is therefore a suitable option for exploring the links between EE, self-efficacy, external attributions, and EFL learning outcomes since the findings would have relevance for all school learners, regardless of whether they chose academic or vocational English (due to the course's broad scope).Much like in Sundqvist's (2009) study, background information was collected to form an understanding of the participants 'cultural capital'; the study group was largely homogenous.Forty participants identified Norwegian as their first language (one also had Lao as a second language), whereas two indicated other first languages (Russian or Flemish).All participants reported using English during travel, which overwhelmingly involved visiting countries in Europe (10 participants had also been to Asia, mostly India and Thailand, or the MENA region, specifically, the United Arab Emirates and Tunisia).Interestingly, just one of the 42 participants had never traveled outside Norway.The research took place over 21 months and used an explanatory sequential approach that contained elements of convergent and case study design.
Data collection involved a mixed-methods approach, with more emphasis placed on quantitative data.Specifically, we used a questionnaire, as well as language diaries with a subset of participants, to explore their EE, self-efficacy, and external attributions, and the marks from an in-depth project, a mock exam, a language proficiency test, and receptive and productive vocabulary tests to assess their learning outcomes.The research timeline and the various instruments are detailed in Table 1.The questionnaire was designed, in part, based on previous studies (Olsson, 2012;Sylven, 2006;Sundqvist, 2009) together with feedback from a pilot study (N = 14 students).It contained 51 items, comprising multiple choice and open-ended questions, which elicited sociobiographical information, EE, self-efficacy regarding EFL reading, writing, listening, and speaking skills, and external attributions (to primary, lower-secondary, and upper-secondary school, as well as to textbooks, literature, TV and film, friends, strangers, YouTube, and EE overall for improvements in English proficiency).Reliability coefficients (McDonald's omega; ω) for EFL reading (ω = .81),writing (ω = .90),listening (ω = .83),and speaking (ω = .82)self-efficacy indicated acceptable internal consistency.Reliability was not calculated for 3 TAF; also known as YSK.TAF is an acronym for tekniske allmennfag.YSK is an acronym for Yrkes-og studiekompetanse (vocational and general study competence).attributions and EE because the items had unique meanings and did not measure the same underlying construct.For EE, when doing data analysis, we categorized the type of exposure based on whether it involved mostly receptive (12 items; e.g., watching TV and film, including news shows, listening to music, podcasts, reading digital or print comics, novels, and short stories) or productive language skills (eight items; e.g., Skype, blogging, tweeting, writing emails, being active on discussion forums and social media platforms such as Facebook and Snapchat) and used composite scores for these two categories when performing statistical tests.We kept gaming as a separate EE category due to data suggesting that it is primarily an activity in which male students engage (Brevik, 2016;Jensen, 2017).Example items from the questionnaire can be found in the Appendix.One of the authors was present when participants filled in the questionnaire to answer any queries.
Set A and Set B, which refer respectively to the two consecutive firstyear TAF groups (see Table 1), were given the questionnaire in the spring semester, and all students completed it.Students in Sets A and B were also asked to write a language diary for a week.Fourteen students volunteered, but three of the diaries had to be discarded (the participants had not filled in enough days).All students from the same set started their diaries, which were intended to serve as a secondary source of data concerning participants' EE frequency and activity (i.e., they were asked to record hours spent on EE and give a detailed account of their activity in an ordinary week covering three school days, two workdays 5 and a weekend).However, due to the limited number of language diaries, the data obtained from them were not used during significance testing and are not reported in this article.As already mentioned, we gathered data on participants' learning outcomes via an English proficiency test 6 that all students at the school where the project was conducted must complete at the start of their first (upper-secondary) year.The test, which is marked out of 150 points, gives a comprehensive overview of students' learning outcomes and is divided into three main sections: reading comprehension, writing, and vocabulary.Other sources of learning outcome data included the mock exam (part of the TAF course) and in-depth project (marks for both provided in percentages).The former was a good indicator of students' learning outcomes, including vocabulary knowledge, since it entailed writing a short text and a longer essay that revolve around sociocultural themes at the forefront of events in English-speaking countries (ca.1200 words).The in-depth project involves students formulating a research question within their vocational field, conducting a literature search, and, if applicable, gathering data from fellow students, culminating in a written report and PowerPoint presentation.The 2000-word level receptive (Webb et al., 2017) and productive (Laufer & Nation, 1999) vocabulary tests were used to assess participants' receptive and productive vocabulary skills.

Data analysis
The data were analyzed in SPSS 28.Mediation analysis (see Fig. 2 for the mediation model) with multiple regression was performed using PROCESS (Hayes, 2017) to evaluate the effects of participants' gender (X G in Fig. 1), self-efficacy (reading, writing, speaking, and listening), external attributions (covariates C SEB and C ATTR respectively), and EE (categorized as 'receptive', 'productive', and 'gaming' and represented by the mediator variable M EE in Fig. 2) on their learning outcomes (i.e., the outcome variable: Y EA ), as measured by the school-administered proficiency test, mock exam, in-depth project, and vocabulary (receptive and productive) tests.
In performing the regression, we checked the data for multicollinearity by calculating the Variance Inflation Factor (VIF) per item, and autocorrelations via the Durbin-Watson statistic (d).A Mann-Whitney U test was done to check for statistically significant differences between participants' self-efficacy, EE, and external attributions based on gender, while a one-way ANOVA was performed to ascertain if there were gender differences concerning participants' learning outcomes, as measured by the proficiency test, mock exam, in-depth project, and vocabulary tests (Levene's test was conducted to determine if both samples had equal variance).An alpha level of .05 was used for all significance testing and the effect size is reported via Hedge's g due to the study's small sample size (partial eta-squared, η p 2 , is reported as an a While seven students kept language diaries, one diary had to be discarded.b Seven students filled in language diaries.Two of these had to be discarded.effect size for the regression).Observed power (1-β) is also provided for all results.When interpreting Hedge's g, .4represents a weak effect size, .7 a medium effect size, and 1 or over a large effect size (Plonsky & Oswald, 2014).For η p 2 , .02represents a small effect size, .13 a medium effect size, and .26 a large effect size (Miles & Shevlin, 2001).

Findings
Table 2 contains descriptive statistics for participants' productive (e. g., Skype), receptive (e.g., watching TV and film), and gaming EE, as well as their self-efficacy and external attributions.As can be seen from the table, participants' receptive EE was higher overall than their productive EE or gaming.As for gender, productive and receptive EE were similar for both male and female participants, whereas gaming was a more frequent activity for male participants.The data also indicated that male participants had stronger self-efficacy for reading, listening, and speaking than female participants while the two group's writing selfefficacy was mostly analogous.Overall, participants were most confident in their listening skills (as evidenced by their self-efficacy), followed by reading, writing, and speaking (in that order).Gender differences can also be seen in participants' external attributions, with female participants crediting textbooks and literature for developing their English proficiency to a greater degree than male participants, who felt more strongly about YouTube helping them with their English proficiency than female participants.In general, participants felt that TV and film contributed to their English proficiency the most, whereas literature, encounters with strangers, and lower-secondary school had the least impact.
Mann-Whitney U test results indicated that gender differences were only statistically significant for gaming and attributions related to Note.g = effect size; 1-β = observed power a 5-point scale (1 = never, 2 = up to an hour weekly, 3 = 1-5 h weekly, 4 = 5-10 h weekly, 5 = >10 h weekly) b 6-point scale (1 = with great difficulty, 2 = with difficulty, 3 = with some difficulty, 4 = with some ease, 5 = with ease, 6 = with great ease) c 5-point scale (1 = not relevant, 2 = almost nothing, 3 = somewhat, 4 = quite a lot, 5 = immensely) A.J. Rød and R. Calafato YouTube (see Table 2), revealing that male participants engaged in gaming statistically significantly more frequently and believed statistically significantly more strongly that YouTube had contributed to the development of their English proficiency.The effect sizes were weak to medium in strength.
Table 3 provides descriptive statistics regarding the results of the various tests used to measure participants' learning outcomes.The data show a large gender difference for the mock exam and in-depth project (female participants performed better than their male counterparts), whereas results from the other tests suggest that both groups have similar learning outcomes.Moreover, participants scored higher on the receptive vocabulary test overall than on the productive vocabulary test.
One-way ANOVA test results revealed that gender differences were statistically significant for the in-depth project and mock exam but not for any of the other tests (see Table 3).Effect sizes were of medium strength.Levene's test indicated that there was homogeneity of variance for male and female participants in relation to their proficiency test (p = .71),mock exam (p = .74),in-depth project (p = .28),receptive vocabulary test (p = .64),and productive vocabulary test (p = .15).
Results from the regression, performed to ascertain the extent to which participants' gender, self-efficacy, external attributions (as independent variables), and EE (as a mediator variable) predicted their proficiency test, mock exam, in-depth project, and receptive and productive vocabulary test results (all outcome variables), showed that the model was statistically significant for the proficiency test [R 2 N = .67,F   .20,SE = 4.61,7.19)].
Table 5 contains the parameter estimates for the regression model with the receptive vocabulary test scores as the outcome variable.The data showed that reading self-efficacy and attributions to YouTube statistically significantly and positively predicted receptive vocabulary scores.
Table 6 contains the parameter estimates for the regression model with the productive vocabulary test scores as the dependent variable.

Discussion
This study explored whether there were any gender differences in participants' EE, self-efficacy, external attributions, and learning outcomes (as measured by multiple tests), the extent to which their gender, self-efficacy, EE, and external attributions predicted their learning outcomes, and whether EE mediated the relationship between their selfefficacy, external attributions, gender, and learning outcomes.Before embarking on a discussion of the findings, it is important to address the study's limitations.First, the study employed a small sample size, which affects the generalizability of the findings, though it is worth noting here that the study took place over 21 months and a relatively detailed EE profile of each participant was obtained.Second, our study relied primarily on quantitative data and analysis, unlike other EE studies from Norway (e.g., Brevik, 2016), which meant that we were not able to obtain more detailed insights into individual differences in participants' EE (e.g., whether they all watched TV and film in the same way, including their engagement with language form and function when doing so).Third, the study was conducted at one school in Norway, and it is possible, had other schools from other regions in the country been included, that the results would have been different.Finally, whilst this study chose to employ multiple data collection instruments, it recognizes that replicability of the research design may be difficult in some circumstances and require adjustments.The questionnaire has proved to be an effective means of collecting a broad spectrum of data, being consistently used in EE studies through the past three decades (e.g.Pickard, 1996;Sundqvist, 2009;Sundqvist, 2019) and its design is an important consideration.In contrast, language diaries may hold promise as a data collection instrument but can be problematic for participants to accurately and consistently maintain, especially younger learners.Ultimately, as previously mentioned, studies are needed that go beyond Note.g = effect size; 1-β = observed power A.J. Rød and R. Calafato just looking at vocabulary and explore the relationship between different types of EE and other learning outcomes, which can similarly be indicative of language proficiency.These limitations notwithstanding, this study is one of the first to examine the extent to which EE, self-efficacy, external attributions, and gender are predictive of learning outcomes via multiple data collection instruments and an exploration of different EE types, EFL reading, writing, listening, and speaking self-efficacy, and external attributions for English proficiency to both EE and school.The following three subsections are organized according to the study's research questions.

Gender differences in participants' EE, self-efficacy, external attributions, and learning outcomes
The findings revealed that gender differences were present, yet statistically significant in only four instances: attributions regarding You-Tube, gaming, and performance on the mock exam and in-depth project (the effect size was of medium strength).Specifically, male participants reported gaming to a greater extent and credited YouTube for their English proficiency more strongly than female participants (see Table 2).Female participants, meanwhile, performed better on the mock exam and in-depth project.Regarding gaming, the findings support the evidence from other studies where boys have been found to play video games more regularly than girls (e.g., Brevik, 2016;Jensen, 2017).Overall, however, discounting gaming, there were no statistically significant differences between the two genders for their receptive or productive EE (the former type of EE being more frequent than the latter), meaning that both boys and girls at the upper-secondary level (at least at the school that participated in the project) had similar EE exposure.This contradicts the data from other studies in the Scandinavian context where boys have been reported to have heavier exposure to English media than girls on average (e.g., Brevik, 2016;Olsson, 2012).More interestingly, the various assessments implemented to gauge participants' learning outcomes revealed that male and female participants had similar levels of receptive and productive vocabulary knowledge (see Table 3).Overall, participants scored better on the receptive vocabulary test than on the productive vocabulary test, which is unsurprising given the general acceptance that language reception is easier than production.The participants also performed equally well in the school-administered proficiency test.The gender differences in the mock exam and in-depth project might, therefore, be due to the format of these assessments.
The in-depth project, for instance, involved research and organization skills while the mock exam required students to reflect on sociocultural themes (see Section 3.1.).The female participants were likely more diligent and efficient during the research phase of the in-depth project and showed greater cultural awareness when tackling the mock exam.This argument finds partial support in the research literature, with studies indicating that females generally have higher levels of cultural empathy, self-control, and self-discipline (these last two would enhance performance in research-oriented tasks) than males (Cundiff & Komarraju, 2008;Solhaug & Osler, 2018), even if both genders are similar in their reflexivity (Thomson & Oppenheimer, 2016).Motivation could also have played a role: studies in Norway have shown wider gender gaps for classroom assessments versus national exams, which may relate to the stakes involved, whereby boys' motivation and effort increase as the stakes rise (as may have been the case with the proficiency test versus the in-depth project and mock exam) (Borgonovi et al., 2018).Meanwhile, statistically significant gender differences in external attributions were limited to YouTube (and participants attributed their English more to EE than to school, especially when considering lower-secondary; see Table 2), with male participants feeling statistically significantly more strongly that the platform contributed to their English proficiency.Jensen (2017), in her study of 107 Danish children, found that gaming correlated statistically significantly with vocabulary performance and that male participants engaged more frequently in gaming than female participants (much like in our study).She observed that male participants might "pay more attention to the language of the games they play and couple their gaming with walkthroughs of gameplay on YouTube as well as other clips, which are able to provide even more appropriate input" (p.14), leading to improved learning outcomes.

Participants' self-efficacy, EE, gender, and external attributions as predictors of their learning outcomes
The findings revealed that gender and writing and speaking selfefficacy statistically significantly and positively predicted performance in the proficiency test, whereas reading self-efficacy and attributions to YouTube statistically significantly and positively predicted performance in the receptive vocabulary test.Only regarding the productive vocabulary test did EE (receptive) statistically significantly predict performance, and then negatively (and with a medium effect size).As with the proficiency test, writing self-efficacy again correlated statistically significantly and positively with productive vocabulary knowledge, as did attributions to literature, TV, and film.Taken together, the data show that self-efficacy, alongside external attributions, positively predicted performance in all three tests where the regression model was found to be statistically significant.In contrast, EE had no statistically significant relationship with performance, except for productive vocabulary.Gender's effects, too, were limited to the proficiency test, where differences were ultimately not statistically significant (see Table 3).As such, the most powerful predictors of performance in our study were self-efficacy and external attributions, with EE playing a less prominent role and gender being a statistically significant predictor of performance in only one instance.Self-efficacy has already been shown to correlate positively with language learning outcomes (Chao et al., 2019;Mills et al., 2006), with the social cognitive model of learning (Bandura, 1986;1997) emphasizing that it is not enough for learners to use strategies in their learning or be exposed to a conducive learning environment; rather, they must also believe that they can accomplish the task at hand.Attributions, which, as already mentioned, are related to self-efficacy, function in much the same way (Graham, 2011), and so it was not unexpected to see both variables play a strong, predictive role when it came to participants' performance (even if teachers might not always be aware of these dynamics in students).
What was interesting, however, was that neither listening nor speaking self-efficacy was statistically significantly predictive of performance in any of the tests, perhaps reflecting the test formats, where listening and speaking were not given much weight.Another noteworthy finding was the statistically significant negative correlation found between receptive EE and productive vocabulary knowledge.Swain's (1985;2001) research shows that students need to produce output alongside receiving comprehensible input to develop their communicative abilities in the target language.If we consider the fact that participants spent more time on receptive EE than productive EE and credited most of the improvement in their English proficiency to watching TV and film, then it comes as no surprise that their receptive EE negatively correlated with their productive vocabulary scores.Moreover, as Schmitt (2008) notes, "It is a commonsense notion that the more a learner engages with a new word, the more likely they are to learn it" (p.338), which makes incidental exposure much less effective than intentional vocabulary learning where "explicit attention to learning the lexical items themselves" is incorporated into activities (p.341).This is because, when incidental learning occurs, "learners who understand the overall message often do not pay attention to the precise meanings of individual words", and words that "are easily understood (guessed) from context may not generate enough engagement to be learned and remembered" (p.341).As such, participants in this study were primarily honing their receptive skills in the language outside the school at the cost of less time allocated to actively and explicitly improving their productive vocabulary knowledge.The findings regarding the negative correlation between receptive EE and productive vocabulary also find partial support in the study by Bollansée et al. (2020), where the researchers found a weak, negative, statistically significant correlation between productive vocabulary knowledge and watching TV with L1 subtitles.

Participants' EE as mediating the relationship between their selfefficacy, external attributions, gender, and learning outcomes
The findings indicated, somewhat unexpectedly, that EE did not play a statistically significant mediating role for any of the measures of assessment used, suggesting that while participants' gender, selfefficacy, and external attributions were directly predictive of performance in some or all the assessments used to measure their learning outcomes, these variables (i.e., gender, self-efficacy, and external attributions) remained independent in this respect from any mediating influence exerted by EE.For instance, leaning on the social cognitive model of learning, we hypothesized that stronger self-efficacy and external attributions should positively correlate with EE (such beliefs would prompt students to practice their English outside the classroom more actively given their increased confidence in their abilities), which would then be reflected in an even stronger performance in the assessments (see Section 2.1.).However, higher self-efficacy regarding English reading, writing, speaking, and listening (and external attributions) did not lead to heightened EE (i.e., students who believed more strongly in their ability to accomplish tasks using their English language skills did not report more frequent EE), which, in turn, did not account for any heightened statistically significant indirect effect on their assessment scores.In other words, participants' exposure to English outside the classroom was perhaps not so much related to their beliefs about their English proficiency as it was to their desire to consume content that happened to be in English (and unavailable in Norwegian or the other languages they knew), that is, they had a pragmatic attitude towards the language during EE.This would corroborate the findings from the study by Brevik (2016), where she found that participants' EE was characterized by simply reading "the information they happened to come across in the language it appeared", so that, when gaming or engaging in other activities, their main focus was not on paying particular attention to the language itself.As Brevik (p.53) notes, "Although these boys used English in their spare time by choice, they did not see how their English proficiency could be transferred to school activities unless specifically being presented with the idea."

Conclusion
This study explored the interactions between EE, self-efficacy, external attributions, gender, and learning outcomes among uppersecondary school students in Norway based on a social cognitive model of learning.From the study's findings, it becomes apparent that participants are frequently exposed to English outside the classroom yet might not be specifically attuned to noticing the language and using EE to enhance their language proficiency.Nevertheless, given their level of reported EE, it can become a significant source of active language learning if students' ability to notice language (and analyze it more critically) during EE is systematically developed.To support such development, teachers can implement activities in the classroom and as homework to actively monitor and help raise their students' EE-related language awareness while also encouraging them to be more attentive to language outside the classroom.At the same time, before implementing activities, teachers should have a wide-ranging discussion (or a series of discussions) with their students regarding their EE to understand its contours, for example, whether their EE is mostly receptive or productive skills-oriented, as well as specific activities that students engage in.A fundamental part of this discussion should be whether the students consider this in-and-out-of-school synergy as an educational boon or an unwelcome impingement on their free time.If the students appear reluctant, then they need to be made aware of the benefits such an approach could have.In addition, a simple questionnaire can be used to collect this information.In terms of their EE contours, if teachers discover that students primarily engage in receptive EE, which negatively correlated with productive vocabulary knowledge in this study, they can prompt them to switch to more productive EE, with a stronger focus on the language itself (and not just content).Concerning gaming, for instance, teachers could encourage their students to record a segment of their play and then do a walkthrough of the segment where they must describe each element and action on the screen in English (they could either stream the walkthrough live or upload it to a site and present it before the class).These types of activities would prompt the students to focus more on not only what is being communicated but also on how it is being communicated in terms of the language (and how they communicate, in return).
In the classroom, teachers could show their students short clips from a film or TV show in English that they (i.e., the students) like, based on prior discussions with the students about their EE, and ask them to write subtitles in English for them (and then upload these to an online subtitle database).This activity, if done from time to time, would encourage students to more systematically and explicitly connect oral speech with text in English during EE and reflect more deeply on language form.Moreover, teachers do not even necessarily need to implement activities requiring active work with the language, at first; they could, as already mentioned, simply raise student's awareness of the benefits related to more actively noticing language during EE, for example, by discussing how spelling and pronunciation can be improved by paying attention to English subtitles when watching TV and noticing the lexical output, thereby rendering TV as an edutainment tool.The findings also underline the risks of assuming that students who report regularly being exposed to English outside the classroom are actually enhancing their language proficiency through such exposure.For language teachers, this means that they should be careful not to recommend that their students simply 'watch more TV shows in English' or 'listen to music' in the language without also emphasizing the need for students to reflect on (and pay attention to) the use of language in these instances (alongside content).Otherwise, students may not significantly benefit their language proficiency via EE and may incur negative effects in this respect.Furthermore, teachers should also raise students' awareness of the different types of vocabulary they are exposed to whilst involved in EE.Concerning future research directions, it is hoped that studies will start to approach EE through a more robust classification system based on the language skills targeted or other aspects of language learning (e.g., affective factors).This study applied rudimentary categorizations to EE that need to be developed into a more complex, nuanced system.Such a system will in turn lead to more sophisticated instruments for data collection.Future studies could also include an exploration of selfregulation effects and EE, specifically, if these two variables correlate (i.e., whether frequent EE accompanies greater self-regulation seeing as EE comprises generally planned activities where individuals have certain goals, even if these goals are not language-related).

Table 1
Research timeline and instruments.

Table 2
Participants' receptive, productive, and gaming EE based on gender.
Table 4 contains the parameter estimates for the regression model with the schooladministered English proficiency test scores as the outcome variable.

Table 3
Performance on tests based on gender.

Table 4
Regression results for the relationship between gender, EE, self-efficacy, external attributions, and proficiency test scores.Regression results for the relationship between gender, EE, self-efficacy, external attributions, and receptive vocabulary test scores.Note.SE = Standard Error; VIF = Variation Inflation Factor; CI = Confidence Interval; LB = Lower Bound; UB = Upper Bound; η p Outcome Variable: Proficiency test Note.SE = Standard Error; VIF = Variation Inflation Factor; CI = Confidence Interval; LB = Lower Bound; UB = Upper Bound; η p 2 = effect size; 1-β = observed power Table 5

Table 6
Regression results for the relationship between gender, EE, self-efficacy, external attributions, and productive vocabulary test scores.