Documenting L2 input and interaction during study abroad: Approaches, instruments and challenges

A major rationale for study abroad (SA) from the perspective of second language acquisition is the presumed opportunity available to sojourners for naturalistic second language (L2) “immersion”. However, such opportunities are affected by variations in the linguistic, institutional and social affordances of SA, in different settings. They are also affected by the varying agency and motivation of sojourners in seeking second language (L2) engagement. For example, many sojourners prioritize mastering informal L2 speech, while others prioritize academic and professional registers including writing. Most will operate multilingually, using their home language, a local language, and/or English as lingua franca for different purposes, and the types of input they seek out, and language practices they enter into, vary accordingly. Consequently, while researchers have developed varied approaches to documenting L2 engagement, and have tried to relate these to measures of L2 development, these efforts have so far seen somewhat mixed success. This article reviews different approaches to documenting SA input and interaction; first, that of participant self-report, using questionnaires, interviews, journals, or language logs. Particular attention is paid to the popular Language Contact Profile (LCP), and to approaches drawing on Social Network Analysis. The limitations of all forms of self-report are acknowledged. The article also examines the contribution of direct observation and recording of L2 input and interaction during SA. This is a significant alternative approach for the study of acquisition, but one which poses theoretical, ethical and practical challenges. Researchers have increasingly enlisted participants as research collaborators who create small corpora through self-recording with L2 interlocutors. Analyses in this tradition have so far prioritized interactional, pragmatic and sociocultural development, in learner corpora, over other dimensions of second language acquisition (SLA). The theoretical and practical challenges of corpus creation in SA settings and their wider use to promote understandings of informal L2 learning are discussed.


I Introduction
Second language acquisition (SLA) researchers have been attracted to study abroad (SA) as a site for research on adult second language (L2) learning for some decades, with a notable boost from the 1990s onward (Freed, 1995). While the general claim has been made that "SA research, to a large degree, is built around the idea of usage-based language acquisition" (McCormick, 2018: 39), language learning during SA has been interpreted from a variety of theoretical perspectives, with skill acquisition theory (DeKeyser, 2015), the Interaction Hypothesis (Gass and Mackey, 2020;Long, 1996) and L2 socialization theory (Duff, 2011;Kinginger, 2017) among the most popular. Much SA research has concentrated on documenting L2 proficiency descriptively, collecting L2 output samples from participants using tools such as the ACTFL Oral Proficiency Interview (OPI: Grey, 2018), or with reference to a complexity/accuracy/fluency framework (CAF: Housen et al., 2012). A broad and longstanding research trend makes it clear that an SA experience indeed promotes L2 development, especially in the domains of fluency and vocabulary (Yang, 2016). However, it is also clear by now that SA does not offer a straightforward 'immersion' environment, but on the contrary a highly complex experience with variable language learning outcomes: "One finding that is apparent in nearly every quantitative study is large amounts of variation" (Isabelli-García et al., 2018, p. 445). This variability in language learning outcomes has been attributed to a variety of "individual differences" between participants, including cognitive and social psychological factors such as L2 motivation, anxiety, or working memory (Sanz and Morales-Front, 2018, Part 4), and sociocultural factors such as agency, identity and international orientation (Kinginger, 2017). It has also been attributed to factors relating to the wider SA context, including the available mix of opportunities for formal instruction and naturalistic acquisition (Briggs, 2015), and residential arrangements (homestays, student residences, shared apartments; Diao et al., 2018;Di Silvio et al., 2015;Kinginger et al., 2016aKinginger et al., , 2016b. Linking the individual and the context are the pre-existing social relationships which SA participants bring from home (Hofer et al., 2016), and also the new social relationships into which SA participants may enter abroad, whether with conationals, with a wider international student community, or with locals (Murphy-Lejeune, 2002); these relationships are well recognized as influential for daily language practices (Baker-Smemoe et al., 2014). Finally, more narrowly linguistic factors are in play, including the level of proficiency that participants bring to their SA experience (Issa and Zalbidea, 2018), and the increasingly multilingual nature of the SA experience (Martínez-Arbelaiz et al., 2017;Tullock and Ortega, 2017), including the increasing use of English both as medium of instruction and as global lingua franca (Dafouz and Smit, 2019;Kalocsai, 2013). Each of these factors has been proposed at some point as likely to influence language learning success during SA, but their relative influence and interconnections between them remain less than clear.
Against this complex backdrop, some leading scholars (Duff, 2019;Taguchi and Collentine, 2018) have recently articulated future research agendas for study abroad. We summarize these agendas here, before examining their methodological implications, and reviewing in the main body of the article the available research approaches that seem most relevant to them. Taguchi and Collentine (2018) respond to a state-of-the art review by Isabelli-García et al. (2018). They propose four research needs, from a broadly sociocognitivist perspective (pp. 553-62): 1. engaging in theory construction that both builds and tests hypotheses; 2. redefining SA outcomes with explicit focus on linguistic and intercultural development combined; 3. understanding the relationship between pre-programmatic linguistic and cognitive abilities and language development while abroad; 4. understanding the effects of social contact while abroad.
Each of these needs is then operationalized into a more specific proposal for a research programme. Thus, to address point 1, which is initially articulated in very abstract terms, Taguchi and Collentine propose mixed-methods research combining both qualitative and quantitative elements. Specifically, they make ambitious proposals for a corpus linguistics approach, developing reference corpora in the target language (TL) for domains of practice typical of study abroad (classroom and academic discourse, service encounters, talk in homestays and student residences, etc.). The linguistic characteristics of these domains would be identified through corpus analysis, and would then guide the assessment of relevant aspects of sojourners' L2 proficiency, i.e. whether "representative linguistic features in that domain" (p. 556) are acquired. To explain the process of acquisition, SA participants' participation in the relevant domains would be studied, through both indirect observational and self-report methods, and through direct micro investigations of "input, output and interaction-based practice" (p. 556).
Research tasks 2-4 derive more transparently from the stated research needs. Research task 2 involves developing a definition of intercultural competence and valid social psychological instruments for measuring this. Research task 3 involves testing whether and how a range of individual difference factors may interact with linguistic development during SA and in particular their relationship with possible threshold levels of L2 proficiency; the discussion centres on initial L2 proficiency levels, and on cognitive factors such as working memory. Finally, research task 4 proposes the adoption of social network analysis (SNA) to model students' contact with L2 speakers and their incidental exposure to L2.
Unlike Taguchi and Collentine, Duff (2019) acknowledges explicitly the multilingual and ideological nature of the SA experience, and her research proposals derive primarily from a language socialization perspective. For Duff, four main themes require investigation: 1. SA participants' identity, agency and investment in L2 learning / in multilingualism; their social networks and language practices when abroad, and how these contribute to L2 development; 2. ideologies and discourses associated with home and host settings, regarding language learning and its contribution to participation in the global economy; 3. language socialization and face-to-face interaction in varied SA settings (the classroom, the place of residence, leisure settings, etc.), and the contributions of interactional input and feedback to L2 development; 4. domains of proficiency, how these are prioritized during SA, and the relative values attaching to these. (Duff, 2019: 14-15) These two sets of research objectives, coming from clearly different theoretical orientations, are somewhat different in how they view the SA linguistic environment, as well as the individual participant. Most noticeably, while Duff acknowledges the multilingual nature of the SA environment, and its structuring in terms of ideologies and discourse, these issues are not explicitly acknowledged by Taguchi and Collentine. However, they all share a concern to understand the linguistic environment at more than one level. First, all refer to a need for better understanding of the "social networks" (Duff) and "social contacts" (Taguchi and Collentine) in which sojourners engage, and the consequences for their general "language practices" (Duff). Second, they all acknowledge the need for better understanding of the immediate face-to-face interactions in which sojourners engage, making explicit reference to constructs deriving from the Interaction Hypothesis (input, output, feedback). Finally, they raise the need for a better understanding of the different "domains of [linguistic] practice", which are characteristic of SA, and their relationship with L2 proficiency development. L2 proficiency itself is clearly to be interpreted broadly, to include the development of sociopragmatic or interactional L2 competence (Taguchi, 2015), alongside more narrowly linguistic proficiency (lexis, grammar, etc.).
In following sections we examine the tools and approaches which are most relevant to these particular SA research aims concerning input and interaction. We divide our discussion into indirect research approaches, which essentially collect and interpret SA participants' self reports regarding their social relationships and language practices, and direct approaches, which collect and analyse samples of naturally occurring L2 input and interaction in SA settings.

II Self-report and indirect approaches to documenting SA input and interaction
This section reviews instruments used to capture participants' perceptions and accounts of the language input they receive, and the social interactions they engage in, during SA. (An earlier review of quantitative approaches is provided by Dewey, 2017.) Underlying all of this work is first of all the general assumption that the SA environment is rich in meaningful TL input, spoken, written, multimodal and/or online, but that SA participants may engage with these affordances in different ways, or indeed may neglect them in favour of first language (L1) or other languages, so that different patterns of engagement may have different language learning consequences. A second major assumption is that the quality of TL interaction available for sojourners (for many researchers, conceived in terms of the Interaction Hypothesis, with reference to negotiation of meaning, noticing, pushed output and/or usable feedback) is dependent on the type of social relationships they develop with TL users, so that study of these relationships can provide indirect insights into language learning affordances.
The approaches to be discussed range from quantitative tools (primarily questionnaires) to qualitative tools such as interviews and participant journals. They may be retrospective, collecting participants' recollections only after the return home from abroad; data may also be collected in sojourn, with varying frequency, e.g. asking participants to describe their language use patterns over the previous day, week or month. They all have in common a reliance on participants' ability to recall and report on general patterns, and/or to narrate particular incidents involving TL use. As the researchers using such instrumentation generally acknowledge, such accounts inevitably involve an element of subjectivity, and cannot capture the fine detail of input or of interaction, nor the actual language processing which converts input into intake, mostly at a subconscious level (see, for example, Collentine, 2011). Despite these limitations, researchers have set to work on the assumption that participant self-report is at least dependable enough to distinguish between L2 use patterns rich enough to support L2 development, and more impoverished patterns; the following discussions illustrate the problematic nature of this assumption.

Language use questionnaires
The best known questionnaire developed to document L2 use patterns during SA is the "Language Contact Profile" (LCP) promoted by Freed from the 1990s, and published by Freed et al. (2004) with reference to Spanish as TL. This questionnaire is in two versions. The "pretest" (i.e. predeparture) version elicits background information including details of all languages known/studied, plus information on respondents' recent use of Spanish. The "posttest" (i.e. postSA) version includes the following sections: • • living arrangements in-sojourn; • • speaking Spanish in-sojourn (with whom, 9 items; for what purposes, 4 items; focus on form, 2 items); • • reading Spanish (6 items); • • listening to Spanish (5 items); • • writing Spanish (5 items); • • using English/other languages (7 items).
Figure 1 presents an extract from the "posttest" LCP to illustrate how participants are expected to report retrospectively on their use of different language skills when abroad; the figure shows selected items concerning listening, but similar scales are used throughout. Clearly, the validity of the instrument depends on participants' capacity to estimate retrospectively with reasonable accuracy both the frequency and duration of their SA language practices. Yet, this is a problematic assumption: for example, there is a large body of psychological research showing that memory of event duration is influenced by levels of attention, active engagement and related emotions (Brunec et al., 2017;Grondin, 2010).
The LCP has been used in a wide range of SA studies, often in adapted form (see reviews in Dewey, 2017;Fernández and Gates Tapia, 2016;Freed et al., 2004). Reliability statistics for LCP itself are not usually reported, though the relationships between LCP scores and pre-and posttests in the TL are explored statistically. However, studies employing it have produced mixed findings concerning relationships between language contact as measured using LCP, and L2 gains. For example, Hernández (2010) administered a version of the ACTFL Oral Proficiency Interview (OPI) in Spanish (pre-and post-SA), plus a modified LCP to 20 students undertaking SA in Spain, and found that LCP scores predicted L2 gains between pre-and posttests. Dewey (2008) found similarly that time spent in L2 face-to-face interaction reported through the LCP correlated positively with L2 vocabulary development. However in a similar study, Magnan and Back (2007) tracked 24 American students of languages undertaking a one-semester sojourn in France. These participants also completed the OPI, pre-and post-sojourn, and a version of the LCP post-sojourn. In this study no significant relationships were found between LCP responses and proficiency gains; the main predictor of L2 development was the level of proficiency pre-sojourn, and the researchers argued that lower starting proficiency may have meant that participants could not make optimal use of the language input available to them (e.g. in homestays). However, they also acknowledged possible issues with LCP dependability, given its retrospective nature, and argued in favour of "a continuous logging of activity" as a possible improvement.
A qualitative investigation into the usefulness of the LCP is reported by Fernández and Gates Tapia (2016). One of the researchers (Fernández) accompanied a group of 12 US learners of Spanish on SA in Argentina, and collected a range of qualitative data alongside administration of a version of LCP, including repeated individual interviews and participant observation. Additionally, the LCP itself was modified to include qualitative comments on participants' responses, and their thoughts and queries while completing LCP were also audiorecorded. Partly by triangulation with their qualitative data, and partly by analysis of the internal consistency of LCP responses, these researchers concluded that Source. Freed et al., 2004, p. 355. 7. How much time did you spend doing each of these activities out of class?
7g. Overall, in listening to Spanish outside of class Typically, how many days per week? 0 1 2 3 4 5 6 7 On these days, typically how many hours per day? 0-1 1-2 2-3 3-4 4-5 5+ 7h. Listening to Spanish television and radio outside of class Typically, how many days per week? 0 1 2 3 4 5 6 7 On these days, typically how many hours per day? 0-1 1-2 2-3 3-4 4-5 5+ 7i. Listening to Spanish movies or videos outside of class Typically, how many days per week? 0 1 2 3 4 5 6 7 On these days, typically how many hours per day? 0-1 1-2 2-3 3-4 4-5 5+ 7j. Listening to Spanish songs outside of class Typically, how many days per week? 0 1 2 3 4 5 6 7 On these days, typically how many hours per day? 0-1 1-2 2-3 3-4 4-5 5+ 7k. Trying to catch other people's conversations in Spanish outside of class Typically, how many days per week? 0 1 2 3 4 5 6 7 On these days, typically how many hours per day? 0-1 1-2 2-3 3-4 4-5 5+ participants were interpreting LCP questions in idiosyncratic ways, and also producing internally inconsistent responses; notably, questions about the global use of Spanish produced lower claims in terms of time spent, than did questions about particular tasks performed in Spanish, or about interactions with particular interlocutors (suggesting that duration was better estimated for more specific events, in line with the conclusions of Brunec et al., 2017, andGrondin, 2010). In addition, LCP failed to capture changes over time. For example, some participants reported in interviews a gradual development in their ability to hold more meaningful and sustained conversations. Fernández and Gates Tapia argue that a one-time questionnaire cannot capture adequately the match/mismatch of available input with learners' particular developmental needs, and that other research approaches are required to achieve this. Some of these issues have been addressed by other questionnaire writers. Dewey et al. (2012) tried to combat "inflation of responses" (p. 120) by using an online questionnaire format which immediately calculated and displayed totals for claimed L2-using hours. Others have requested relative rather than absolute frequency judgements concerning different types of language use, and/or administered their instrument repeatedly in-sojourn (Briggs, 2015;Martínez-Arbelaiz et al., 2017;Mitchell et al., 2017, Chapter 7). For example, Mitchell et al. (2017) tracked a group of 56 language specialists through a two-semester sojourn in France, Mexico or Spain. Their Language Engagement Questionnaire (LEQ) asks participants to score a range of activities on a six-point frequency scale, for every language that they know; this instrument proved statistically reliable and was administered on three occasions in sojourn. (For the Spanish section, see Figure 2.) Mitchell et al. judged that the LEQ provided usable descriptive information regarding participants' preferences for L2-using activities (e.g. favouring face-to-face interactions and media use over paper-based reading or writing), for comparing preference patterns across languages, and regarding the impact of the wider context on language use. For example, where participants were housed with locals, and there were few other international students, reported L2 use was highest (Mexico); where participants generally lived with co-nationals or other international students, and where the international student community was very large, L2 use was lowest (France). Additionally, they found significant positive correlations between LEQ scores and a measure of TL fluency development (gains in speech rate), though not with other measures (Mitchell et al., 2017, Chapter 9). However, again, Mitchell et al. acknowledge the need to adopt complementary research approaches, to address issues of quality of L2 input and interaction.

Language logs
In an attempt to improve the reliability of self-report regarding L2 use, and to address possible changes over time, some SA researchers have adopted language logs. For example, Brecht and Robinson (1993) asked their participants (undertaking SA in Russia) to complete hourly calendar diaries concerning their out-of-class activities, their companions and the language being used, for a week at a time. Development of a reliable coding scheme for the resulting data posed considerable challenges, because of the very variable level of detail provided by the participants, and involved collating of particular reported "events" into 11 "general situations", as well as estimations of event duration by culturally informed coders. However, qualitative analysis indicated that participants with higher initial proficiency in Russian spent more time in Russian-using activities, and also made greater gains in L2 Russian. Ranta and Meckelborg (2013) drew inspiration from Brecht and Robinson (1993), but developed a computerized Language Activities Log, which their participants (Chinese students in Canada) were asked to complete daily, for one week per month. This log required data to be provided for 15-minute blocks of time, but this task was made easier as participants selected their activities, their companions, and the language(s) being used, from drop-down menus, rather than writing them in. (For an illustration of these menus, designed following interviews with a pilot group, see Figure 3.) This study did not attempt to relate language use patterns to L2 development, but the researchers were concerned by what they viewed as participants' relatively low (and declining) levels of spoken interaction in L2 English, reflecting expectations deriving from the Interaction Hypothesis, i.e. that high levels of interaction would promote L2 development.
Again in response to the perceived limitations of LCP, García-Amaya (2017) designed a mobile phone app to collect daily information regarding use of L1 English and L2 Spanish, for a group of US students in Spain (and living in homestays). The Daily Linguistic Questionnaire divided the day into four quarters and, for each one, participants recorded the activities they engaged in, and estimated (in minutes) the amount of time spent speaking/reading/writing/listening to Spanish and English. The 43 participants were Source. Mitchell et al., 2017, Chapter 7. prompted to complete the questionnaire once a day for six weeks, with a 75% completion rate. The main finding was a reported decline in use of L2 Spanish over time (as interactions with host families declined and interactions with peers increased). Seibert Hanson and Dracos (2019) describe another mobile phone app, used daily by 11 SA participants on a six-week programme in Argentina to report their use of L1 and L2 digital media. Participants recorded their estimates (in minutes and hours) of time spent over the whole day using the following headings: 1.
spoken with native Spanish/English speakers in person; 2.
read online or via tablet the news, novels, web pages in Spanish/English; 4. read e-mail, Facebook, Twitter, texts in Spanish/English; 5.
watched or listened to television, movies, radio, and music in Spanish/English; 6.
written texts in Spanish/English.
Absolute time estimates varied widely, but the approach seemed successful in documenting relative amounts of time spent online in each language, including change over time,   and the balance of online activities. The researchers did not find any significant relationships between time spent using Spanish online, and gains on grammar and reading comprehension tests; it is of course possible that a larger/ longer study might have produced clearer results. Dewey (2017) reviews a number of further studies of this type. He acknowledges that the immediacy of recall involved with log completion may be an advantage over LCPtype self-report, but concludes overall that greater "consistency, control, and comparison" are required (p. 53), before logs can be preferred to other forms of self-report on language use. For all types of self-report instrument, reporting of reliability statistics and discussion of validity of data is rare.

Modelling social networks
In addition to seeking self-reports on language use during SA, researchers have consistently shown interest in the social relations which sojourners develop when abroad, especially with locals, but also with fellow nationals, or with international peers. The rationales for this interest in social networking are largely to do with access to L2 input (the assumption being that stronger networking with locals, will lead to richer L2 input/ interaction). As Isabelli-García puts it: "Learners in extended networks with native speakers will acquire a set of linguistic norms that are enforced by exchange with those native-speaker contacts" (2006: 236). She references mechanisms of "noticing", "scaffolding" and "restructuring" in support of this view. However, rationales also relate to SA researchers' interest in sojourners' identity development, and/or their acquisition of intercultural competence (García-Nieto, 2018). Some SA researchers have theorized this issue in terms of the communities of practice in which sojourners engage (e.g. Umino and Benson, 2016;Zappa-Hollman and Duff, 2015). Others however have turned to various versions of social networking theory (Milroy, 1987), and have adopted tools designed to capture formally different aspects of sojourner networks (their size, density, multiplexity, dispersion, etc.), and to represent these graphically. (Potentially relevant literature has to be reviewed with caution however, as quite a few researchers use the term "social networking" informally to refer to social relationships, without undertaking any actual network modelling.) A pioneering small-scale study modelling sojourners' social networks was carried out by Isabelli-García (2006). She documented the Spanish-speaking contacts of four US students sojourning in Argentina, through "network contact logsheets" which participants completed on three occasions. Participants' networks were modelled graphically, drawing on the log information but also on interviews and journals. Analysis showed that two of her case study participants succeeded in entering multiplex Spanish-speaking networks over time, and made greater linguistic advances than those who did not do so.
A significant advance in the modelling of social networks was the development of the Study Abroad Social Interaction Questionnaire (SASIQ) by Dewey and associates (Dewey et al., 2012(Dewey et al., , 2013. The SASIQ is designed specifically to enable the modelling of different social network types. It first invites the respondent to identify their individual acquaintances in sojourn. To capture network intensity, each named acquaintance is rated for degree of friendship. To capture network durability, questions are asked about frequency of interaction (in L2 and/ or in L1); to capture network density (degree of interconnectedness among acquaintances), participants arrange their acquaintances into social groups. The number of different groups identified provides a measure of dispersion; finally, the SASIQ also asks about the English proficiency of network members who are speakers of other languages. This tool has been used in several empirical studies by Dewey and his group, where various measures of L2 development were also administered. For example, a quantitative study by Baker-Smemoe et al. (2014) tracked the L2 development of over 100 Anglophone SA participants, in six different locations. The language measure used was the OPI, used as pre-and posttest, and participants also completed the SASIQ as well as measures of intercultural sensitivity and of L2 use (in this case, a language log: Martinsen, 2010). For subsequent analysis, the cohort was divided into two groups, those who progressed at least one sublevel on the OPI (the "gainer" group), and those who did not (the "nongainers"). Statistical analyses showed that the best predictors of learning gains were pre-program L2 proficiency (with lower initial proficiency predicting higher gains), intercultural sensitivity, and aspects of social networking, specifically a reduction in network size over time, greater network dispersion (membership of different groups), and the level of English knowledge among network members (but not reported L2 use). The authors interpret these findings to suggest that the development of close social relations with TL speakers, even through English initially, gives access over time to TL input and interaction opportunities which are well tuned to learner needs, a "quality" dimension which is not captured by language use measures. SASIQ has also proved useful beyond the sphere of Anglophone SA research; Baten (2020) has used a modified version of SASIQ successfully to study social networks and related multilingual practices among L1 Dutch students undertaking SA through the ERASMUS program in a range of European settings.
A recent social networks study reported by Gautier (2019) introduces somewhat different procedures, current in sociological studies of social networks, and including graphical representation of individual SA participants' ego networks. Gautier followed 29 participants at a French university over an academic year (15 American, 14 Chinese). On three occasions the participants completed a week-long log of their conversational interactions (face-to-face or on the phone), noting interlocutors' names as well as duration, place and language, throughout each day. The logs informed qualitative interviews on personal relations and social practices. Gautier then modelled individual networks using cluster analysis. The analysis focused on structural aspects including the size and number of ties within a network, plus measures of density (the proportion of reciprocal links), and centrality (the variable importance of particular individuals within the network). Four classes of network members were distinguished: "originals" (co-national mobile students); "national peers" (co-nationals permanently resident in France); "hosts" (French nationals residing in France); and "transnationals" (other nationals residing in France).
Gautier's in-depth approach allowed her to distinguish five different personal network types, two of them more characteristic of American participants ("Dense" and "Extended"), two of the Chinese participants ("Concentrated" and "Dissociated"). The fifth "Eclectic" network type is illustrated in Figure 4. (Four American and three Chinese participants had networks of this type at some point; links to the target participant himself are not shown.) This network includes a relatively dense section (seen on the right hand side), mainly involving transnationals plus a few co-nationals. The centre shows a looser section, but including a high number of hosts; on the left are a variety of isolate pairs. Eclectic networks were characterized by the highest amount of French conversation, and seemed to represent the most successful social integration, which Gautier attributes primarily to individual motivations to "get closer to host Alter" (p. 231). Like the work of Dewey and associates, Gautier's methodology adds depth and precision to the modelling of social relationships and has potential to make specific predictions regarding the relationship between social networking and L2 development. A further study by Hasegawa (2019) models similarly the social networks of American sojourners in Japan, and complements this with direct recordings of interactions among selected network members.

Interviews and journals
Researchers adopting a longitudinal perspective on SA, and in particular those who have aimed to move beyond a pre-post approach to documenting language gain and its relationship with the SA experience, have frequently elicited qualitative accounts of language engagement and social relationships in sojourn, through a combination of tools such as participant journals and individual interviews, or through analysis of sojourners' activities on social media (Dressler and Dressler, 2016). For example, an influential ethnographic study by Kinginger (2008) used biweekly journals and pre-, mid-and postsojourn interviews, in addition to language logs and a range of language tests, in order to generate a series of narrative case studies of American students undertaking a onesemester stay in France. Framed through sociocultural theory, this study documented Source. Gautier, 2019, p. 228. how student agency and personal learning goals interacted with their social experiences, to impact on their use of French and L2 achievement over time.
The conduct of regular interviews in sojourn, as a source of narrative accounts of L2 experience, has been imitated in numerous later multimethod studies. For example, Fernández (2013) complemented use of LCP, participant observation and the direct recording of participant L2 interactions (discussed further below), with three in-sojourn interviews concerning both language practices and social networking. Taguchi (2015) combined a series of three in-sojourn interviews focusing on "communication and cultural contact" with use of LCP and a motivation questionnaire. Sauer and R Ellis (2019) tracked two German adolescents attending a New Zealand high school for six months, and conducted monthly interviews on these topics in addition to eliciting bimonthly blogs on the participants' language experiences, and monthly "reflective reports" on intercultural issues. Mitchell et al. (2017) interviewed their participants on three occasions in sojourn, in addition to pre-and post-sojourn interviews, and administration of LEQ and a Social Networking Questionnaire. One feature of these interviews was an invitation to relate language-related "memorable incidents", which often produced emotionally charged anecdotes of experiences of (mis)understandings, (non)problem-solving, or the building/breakdown of relationships through L2 (similar to the ethnographic "rich points" of Kinginger, 2008, p. 7).
Images may also be used to document SA relationships, and to stimulate selfreflection by participants. Guichon (2019) described a smartphone app, which SA participants can use to take photos and add commentaries on their SA experiences, so creating a series of multimodal blogs. Umino and Benson (2016) conducted extensive interviews with an Indonesian student (Iwan) studying in Japan, in years 3 and 4 of his four-year stay. These interviews adopted a "life history" approach, and drew on the student's own photograph collection to illustrate his long term social trajectory, summed up by the researchers as "a trajectory that moved from participation in communities that were institutionally organized and mainly involved international student groups to participation in self-organized communities that mainly involved local Japanese groups" (2016: 762). In this last phase, Iwan commented on his own sociopragmatic development: In October, I met Yasu (a Japanese classmate) on the street near my house. He said he wanted to visit my place so I invited him [. . .] Other students also came along, and my house became a salon for my circle of friends. After I started socializing with this group of friends, I think I began learning how to initiate conversations in Japanese. For example, you should not begin with big topics but start with small talk and gradually expand upon it. In Japan, you should not make jokes when meeting for the first time. It does not work. Now I feel like I know how to talk with new people, and how to express myself. Now I can express what kind of person I am in Japanese whereas before I could not. (Umino and Benson, 2016: 767) Here we find evidence for explicit participant perceptions of strong links between the quality of social relationships and the acquisition of appropriate language practices. However, even in such detailed qualitative case studies, the language practices themselves are not directly documented.

III Direct approaches to documenting input and interaction during SA
As we saw in the introduction, the reviews of both Duff (2019) and Taguchi and Collentine (2018) call for the direct documentation of TL input and interaction during SA. They present two principal motivations for this. First, they argue for analysis of SA interactions, through the lens of the Interaction Hypothesis, examining such features as meaning negotiations, pushed output, noticing, and corrective feedback, and their contributions to L2 development. Second, they argue for a better understanding of the various "domains of practice" encountered during SA, and analysis of how SA participants develop relevant sociopragmatic and interactional competence through participation in these domains. Clearly, much detailed language processing in these naturalistic settings takes place below the level of participants' awareness, and is therefore largely inaccessible to self report, hence the need for direct documentation. Additionally, as Shively (2018a) argues with particular reference to L2 sociopragmatics, in naturalistic settings (as opposed to role plays, tests, etc.) there are more likely to be "genuine affective, relational and material consequences" attached to interactions, so that participants are motivated to pursue interactional goals, work to maintain and repair relationships, and negotiate identities, and in general terms display their current level of interactional competence (Hall and Doehler, 2011). Recordings and accompanying transcriptions are of course indispensable for any detailed analysis of meaning negotiation and feedback, of scaffolding or microgenesis, or for conversation analysis.

Data collection and ethical issues
Regardless of precise theoretical orientations, the challenge of interactional data collection in SA settings is similar. Contexts must be found where participants engage in some form of face-to-face or online interaction, and where either audio or videorecording must be practicable, without threatening or overawing the participants. In practice, most such data comprises audiorecordings, self-recorded by participants, and studies are modest in scale. The settings most commonly documented in the SA literature include: • • interactions with host families in homestay settings; see, for example, Diao et al., 2018;chapters in DuFon and Churchill, 2006;Kinginger, 2015;Kinginger et al., 2016aKinginger et al., , 2016bLee and Kinginger, 2016;Lee et al., 2017;McMeekin, 2017;Pryde, 2014;Shively, 2015Shively, , 2018bWilkinson, 1998Wilkinson, , 2002; • • leisure interactions with local or international peers in halls of residence and student lounges see, for example, Behrent, 2007;Diao, 2014aDiao, , 2016Hasegawa, 2019;Kinginger and Wu, 2018; • • conversations with language partners see, for example, Bryfonski and Sanz, 2018;Fernández, 2013Fernández, , 2016Fernández-García and Martínez-Arbelaiz, 2014;Kasper and Kim, 2015;Kurata, 2011;Ning, 2020;Shively, 2015Shively, , 2018b; • • service encounters see, for example, Diaz et al., 2018;Ning, 2020;Shively, 2011Shively, , 2018b; • • online interactions see, for example, Back, 2013;Diao, 2014b;Kurata, 2011. This selection of settings needs some comment, as it clearly does not represent the full range of SA "domains of practice". Partly for reasons of access and audio quality, but also perhaps seeking domains where participants are most likely to make sustained interactional contributions in L2, there is a focus on small group or dyadic interaction, and on fixed settings (no parties, no evening outings or touristic excursions). There are very few examples of classroom recordings and classroom discourse analysis (such as Fernández, 2018;McMeekin, 2006), even though SA participants typically attend language classes and/or TL-medium content classes (unless classroom research on English as a medium of instruction can be understood as relevant to the classroom experience of SA participants, e.g. Smit, 2010). There is no focus on non-interactive forms of TL input (e.g. unlike in other SLA fields, there is no tradition of gathering/ analysing linguistic data representing what participants are reading or watching on TV, or lectures they may attend). Researchers investigating the acquisition of sociolinguistic variation by SA participants do allude to native speaker reference corpora (e.g. Regan et al., 2009 for French;Kanwit et al., 2018 for Spanish). However, the more specialized reference corpora whose creation is recommended by Taguchi and Collentine (2018) for relevant domains of practice within SA remain very rare; one example pointing to what could be done is "Infotravel", a small Spanish corpus including both authentic travel-related service encounters, and role play service encounters created by L2 learners (Diaz et al., 2018).
A further challenge has to do with the ethical considerations involved in direct recording of naturalistic interactions, and/or the collection of online exchanges, though there is little explicit discussion of this issue in the SA literature; most published studies either refer briefly to having received formal approval from institutional ethical review boards, to having obtained informed consent from participants, or say nothing about the issue. Shively (2018a) gives more extended consideration to ethical considerations in her discussion of naturalistic data collection in L2 pragmatics research. In her own research (2018b), she obtained informed consent for audiorecording from SA participants, from homestay family members, and from language partners, and identities were anonymized. However, the service encounter recordings collected by her participants were obtained covertly (as also were the native speaker recordings described in Diaz et al., 2018). Shively herself justifies this on grounds that: Service encounters involve public speech; the identity of the service providers remained anonymous; informed consent would have been a barrier to the research; and the value of the research outweighed the risks of harm. The concept of covert recording remains contentious however, and professional bodies recommend that, at a minimum, those recorded covertly should be informed retrospectively about the research, and offered an option to refuse consent (e.g. British Psychological Society, 2014). Similar issues arise in internet-based research and need to be taken into account in SA research involving the collection and analysis of Facebook posts, tweets and SMS (British Psychological Society, 2017).

Analysing data: The IH perspective
Given the focus in SA research on various types of naturalistic face-to-face interaction, it is not surprising that data preparation commonly involves transcription using the conventions of conversation analysis. However, data analysis is undertaken from a variety of perspectives; in this section, we briefly exemplify analyses motivated by constructs connected to the Interaction Hypothesis (IH). (See also a recent review by Bryfonski and Mackey, 2018.) An early study by Wilkinson (1998Wilkinson ( , 2002 pioneered the recording of homestay interaction, and found that homestay families in France were very active in meaning negotiation as well as providing explicit feedback/correction for sojourners. However, these were one-off recordings, so their typicality is unknown. McMeekin (2006) analysed sets of videorecorded interactions between Anglophone sojourners and Japanese homestay families for occurrences of meaning negotiations, input modifications, feedback, and pushed output, and compared these with the same sojourners' classroom interactions. She found noticeable differences, in particular that homestay talk offered many more meaning negotiations, implicit feedback and better tailored input. However, the classroom was more productive for sojourners' pushed output. In a later study (2017), with homestay data only, she looked in more detail at meaning negotiations, and argued that conversational trouble in the form of "word searches" was resolved collaboratively, through the cooperative deployment of communication strategies by both sojourners and their interlocutors.
Another small-scale study by Fernández-García and Martínez-Arbelaiz with a focus on acquisition (2014), drew explicitly on the IH to investigate Spanish interaction in eight conversations recorded between American and Spanish student conversation partners in San Sebastian, Spain. They found that local peer interlocutors provided frequent recasts and lexical assistance in Spanish, as well as negotiations of meaning and utterance completions. They found also that L2 Spanish speakers exploited this assistance effectively, with frequent "uptake" of forms and willingness to solicit help. They speculate that the dyads' equivalent status as students of languages facilitated these collaborative efforts.
Similarly, Bryfonski and Sanz (2018) analysed dyad and small group conversations between American SA participants and Spanish conversation partners, recorded on three successive occasions over a six-week period. Following their return to the home institution, the SA participants completed short individual tests, focusing on lexis and grammar which had been the focus of negotiation and feedback during the recorded conversations, as well as stimulated recall interviews. The recordings were transcribed and all feedback episodes identified and coded for various types of implicit and explicit feedback (following Mackey, 2012). They also investigated the role of L1 English in meaning negotiations. Overall this study found that instances of feedback of all types diminished over time; that just under half the language points that had been the subject of negotiation had been learned; and that limited use of English played a constructive role in promoting successful meaning negotiation.
In addition to the studies just reviewed which explicitly adopt an IH perspective, analyses of feedback and uptake drawing on IH constructs can be found in some other direct studies of SA/L2 interactions. Examples are the analyses of Kurata (2011) of interactions between L1 English sojourners and their Japanese language partners (framed within sociocultural theory) and that of Ning (2020), who examined sample interactions in third language (L3) Catalan involving L1 Chinese sojourners and community interlocutors (framed within language socialization theory).

Analysing data: Language socialization, sociopragmatics and sociocultural theory
The main alternative approach to IH which has been adopted in the direct study of SA communication is some version of applied conversation analysis (Brouwer, 2012), framed by language socialization theory and/or sociocultural theory. While some of these studies focus at least partly on negotiations around L2 lexis and/or morphosyntax (e.g. the studies of Kurata, 2011, andof Ning, 2020, mentioned above), most are concerned with the development of sociopragmatic, interactional, and/or intercultural competence.
Numbers of studies of this type investigate acquisition of features of particular speech styles. For example, Cook (2008) asked nine SA participants in Japan to videorecord homestay dinnertable conversations on three occasions each, and studied degrees of politeness/formality in the resulting corpus. These data were analysed from an L2 socialization perspective, focusing on alternation between less formal and more formal speech registers ("plain style" and "masu style", used to speak with authority or in a teacherly manner). Cook showed that, over time, the balance between styles used by the learners approached that of the expert speakers, and became more appropriate, e.g. shifting to masu style only when leading an activity. She argued that this was the result of (unconscious) socialization, as these style differences were not explicitly discussed with hosts. A similar study concerning the control of masu style is reported by Taguchi (2015), but in this case involving use of Japanese as lingua franca between L2 users from different language backgrounds. In her analysis of audiorecorded service encounters in Toledo, Spain, Shively (2011) tracked the acquisition over time of more appropriate (i.e. more direct) request styles by American sojourners learning Spanish. Diao et al. (2018) drew on a substantial corpus of homestay mealtime recordings, selfrecorded by seven American students in China to study development in another subdomain of interactional competence, i.e. conversation management. They conducted analysis on 12 hours of this corpus, comprising 20-minute excerpts selected from earlier and later sessions. Their focus of interest was on sojourners' production of "response turns" (RTs: backchannelling, repetitions, etc.), in the course of interaction; such turns are typically used less frequently in Chinese conversation than in English, and Diao et al. were interested in how far the participants' use of RTs would evolve toward Chinese norms. Their findings did show considerable changes in participants' use of RTs, though these were interpreted as primarily reflecting greater levels of understanding and ability to contribute actively to conversation, rather than any adoption of Chinese norms. Shively's multimethod study in Toledo (Shively, 2018b) also examined interactional competence, using audiorecorded data from peer and host family conversations. These were used to explore participants' use of "assessments" (judgmental comments: Shively, 2015), and also their alignment with Spanish modes of expressing humor (Shively, 2013).
Case study work in this tradition has produced suggestions that language attitudes and identities may affect the acquisition of particular pragmalinguistic features. Studies by Diao (2014aDiao ( , 2016 of dyadic interactions between American sojourners and their Chinese roommates investigated the recent adaptation in young people's spoken Chinese of selected particles to indicate femininity and "cuteness" (specifically, the particles a/ya, la, me, o, eh/ye), and how far her participants would be socialized into the use of these particles. Her case study analysis showed that participants' (non)adoption of this speech style was largely influenced by their relations with their roommate, and the latter's gendered attitudes towards this. In her multimethod study previously discussed with reference to LCP, Fernández (2013Fernández ( , 2016 similarly collected a corpus of 27 hours' talk between Anglophone sojourners and Spanish conversation partners in Argentina. Her focus of interest was the acquisition of features of Argentinian youth speech style ("youngspeak", involving use of slang, taboo words, and vague language). However, her 2016 case study analysing interactions between two trainee teachers of Spanish (one American, one Argentinian) showed avoidance of "youngspeak" features and other local dialect features, which Fernández interprets as reflecting their shared career aspirations. She concludes that language "authenticity" must be viewed in context, as a developmental process, which is enacted relative to speakers' subjectivity.
Finally, a series of studies by Kinginger and associates have collected substantial corpora of SA homestay and peer (roommate) interactions, and have analysed linguistic and intercultural development from a more holistic, sociocultural perspective. In several homestay studies, American sojourners have self-recorded mealtime interactions with their homestay hosts in China. Kinginger et al. (2016a) collected 13 hours' recordings, from three different participants; Lee et al. (2017) collected 25 hours, from two further participants. While some attention was paid in analysis to the direct teaching of food vocabulary, these sets of recordings were analysed primarily to document the socialization of the participants into mealtime customs and traditions including use of linguistic expressions concerning values such as avoiding food waste. In case studies of roommate interactions, Kinginger and Wu (2018) focus on the telling of emotionally charged narratives, and on language play. They interpret these practices in terms of sociocultural theory, arguing that they were "leading practices conducive to the participants' learning on the microgenetic and ontogenetic levels" (p. 118). As mediational means, the participants drew on "Semiotic resources rich in cultural meanings, such as stances, personae, identities, ideologies, and narrative templates" (p. 118), which evolved over time through participants' growing intersubjectivity.

IV Conclusions
Overall, this survey has shown us a field of research in active development, and with increasing methodological variety and ambition. In conclusion, the strengths and limitations of the field, in documenting and understanding language engagement during SA, will be briefly summed up.
First, the indirect methods used to document the linguistic and social environment and social relationships of SA are becoming more varied and flexible, with potential for greater reliability in recording the language practices in which SA participants engage, and the social networks they develop. The innovative use of mobile apps to carry language logs has considerable potential to improve the reliability and detail of self-reports on in-sojourn linguistic practices. Encouraging features are the increase in multimethod studies, and concerns with triangulation. There is greater acknowledgement of the multilingual nature of sojourners' language practices, and of the significance of online practices. However, with limited exceptions studies remain relatively small, and there is a need for greater collective effort around the development of instruments such as LCP, SASIQ or app-based language logs, for fuller investigation of their properties, and for replication studies using them.
Second, regarding the direct recording of SA language practices, progress has also been made. Primarily by recruiting participants themselves as research assistants, ways have been found to capture important types of informal spoken interaction. We have learned much from the process, not least the dynamic nature of "authenticity" in informal expertnovice talk, and how this is tailored not only to immediate communicative needs, but also to identity issues (e.g. in the avoidance of dialect, or the modification of politeness norms). Important insights have already been drawn from these SA interaction corpora concerning meaning negotiation, feedback and L2 pragmatic development, alongside insights into intercultural socialization and identity development. However, the evidence base for different types of SA interaction remains small, and important types are hardly represented (e.g. classroom discourse, talk in organized activities such as sport).
When we return to the suggestions of Duff, Taguchi and Collentine, however, the most obvious "gap" in current direct datasets has to do with input rather than interaction. Specifically, the corpus linguistics approach suggested by Taguchi and Collentine is largely absent from the field. While SA learner corpora have been appearing (e.g. Tracy-Ventura et al., 2016), there is very little use in the field of general reference corpora for TLs to support analysis of TL development (with the partial exception of variationist research). And there has been no sustained effort to create SA-specific input corpora, nor to make any systematic collection of the actual input to which sojourners are exposed (other than interactional corpora discussed earlier). The only partial exceptions are SA learner corpora which include some samples of L1 speaker performances on the tasks required of L2 contributors (e.g. Czerwionka and Olson, 2020;Tracy-Ventura et al., 2016). Here, the field clearly differs not only from L1 acquisition research (e.g. Lieven, 2016;Rowe, 2012), but also from the corpus-based work of (non-SA) SLA researchers such as Dimroth et al. (2013), N Ellis et al. (2016), Eskildsen (e.g. Eskildsen, 2015), Hellermann (2008) or Pujadas and Muñoz (2019).
In those other fields it is clearly usage-based acquisition theory and its concerns with frequency, prototypicality, construction learning, etc. which has driven the creation and analysis of both general and genre-specific corpora of the types advocated by Taguchi and Collentine. However, while some SA researchers make clear their adherence to usage-based theory (such as the variationists Geeslin and Garrett, 2018), methodologically they focus almost exclusively on the analysis of variation within L2 production. Now that a subgroup of researchers have showed us the feasibility of documenting informal language use during SA, perhaps it is time to widen our theoretical paradigms (notably to include more substantial reference to usage-based theory), and/or to make testing them more central to our research programmes (e.g. in the case of the IH). In this way we may be able to make more systematic progress toward understanding how exactly informal language engagement is more/less beneficial, in achieving the central aims of language-related study abroad.

Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/ or publication of this article.

Funding
The author received no financial support for the research, authorship, and/or publication of this article.