Brain activity predicts future learning success in intensive second language listening training

This study explores neural mechanisms underlying how prior knowledge gained from pre-listening transcript reading helps comprehend fast-rate speech in a second language (L2) and applies to L2 learning. Top-down predictive processing by prior knowledge may play an important role in L2 speech comprehension and improving listening skill. By manipulating the pre-listening transcript effect (pre-listening transcript reading [TR] vs. no transcript reading [NTR]) and type of languages (first language (L1) vs. L2), we measured brain activity in L2 learners, who performed fast-rate listening comprehension tasks during functional magnetic resonance imaging. Thereafter, we examined whether TR_L2-specific brain activity can predict individual learning success after an intensive listening training. The left angular and superior temporal gyri were key areas responsible for integrating prior knowledge to sensory input. Activity in these areas correlated significantly with gain scores on subsequent training, indicating that brain activity related to prior knowledge-sensory input integration predicts future learning success.


Background: second language listening training
Listening comprehension is defined as a process of constructing meaning from various acoustic signal-based knowledge sources (Rost, 2013). Development of listening skills is important for second language (L2) acquisition. Previous studies suggest that L2 listening skill can be improved with methods, such as teaching metacognitive strategies (Vandergrift & Tafaghodtari, 2010;Cross, 2009), teaching segmentation and phonetic variations (Field, 2003), audio-visual approaches, and pre-listening activities (introducing vocabularies, questions about the speech content, etc.). Among these methods, pre-listening transcript reading is effective in improving L2 listening comprehension and acquisition (Field, 2008;Lund, 1991;Vandergrift & Goh, 2012). Pre-listening activities can help L2 learners comprehend better by enhancing their prior knowledge (Chang, 2009;Chang, 2012;Li, Wu & Lin, 2017;Molavi & Kuhi, 2018;Perez, Noortgate, & Piet Desmet, 2013). For example, individuals who perform pre-listening tasks, such as preview with captions, pictures, and vocabulary, comprehend significantly better than those who only perform listening tasks (Chang, 2009;Chang, 2012;Li et al., 2017). However, methods for improving listening comprehension cannot always contribute to the acquisition of listening skills because L2 learners cannot cope with the natural speech rate of native speakers.
One crucial factor associated with listening difficulties of English as a Foreign Language (EFL) learners is the speech rate because listeners need to decode sound streams into meaningful units (i.e., bottom-up parsing) and simultaneously interpret the integrated input based on prior knowledge (i.e., top-down inference) (Blau, 1990;Graham, 2006;McBride, 2011;Rost, 2013). Although listening at a slower speed is easier to understand than listening to the same texts at higher speeds, it is often impractical. Accordingly, enhancing speech processing speed is crucial for improving comprehension in L2 listening. A method for promoting processing speed is listening practice, in which speech is delivered at a fast rate (Adank & Devlin, 2010;Banai & Lavner, 2012).
https://doi.org/10.1016/j.bandl.2020.104839 Received 30 October 2019; Received in revised form 3 June 2020; Accepted 14 July 2020 However, enhancing processing speed with fast-rate speech practice in L2 is not always successful because the rate of speech is inversely correlated with comprehension success (McBride, 2011). To solve this problem, as Krashen (1985) asserted, comprehensible input is important for L2 acquisition. Thus, employing a training method that combines fast-rate speech with pre-listening transcript reading might be effective in improving L2 listening ability and speech processing speed without decreasing comprehension. Vandergrift (2007) observed positive effects of pre-listening transcript reading on L2 listening comprehension as a result of promotion in connections between a language's meaning in long-term memory and its speech. Reading a speech transcript before listening to it helps learners recognize words in the speech more quickly because word meanings are easier to retrieve (Vandergrift, 2007).
Prior to the current neuroimaging study, Kajiura (2019) conducted a behavioral experiment to investigate the effect of L2 listening practice using fast-rate speech in combination with pre-listening transcript reading. In that study, the effect of four training methods (only fast-rate listening [F], only normal listening [N], fast-rate listening combined with transcript reading [FT], and normal listening with transcript reading [NT]) was compared during intensive listening training (2.5 h × 5 days). The results revealed that the combined effect of listening and transcript reading methods [FT and NT] significantly enhanced L2 learning compared to only listening methods [F and T]. Surprisingly, the FT and NT groups showed equally significant improvement in scores without any deterioration due to the fast-speed rate. Kajiura (2019) suggested that L2 listening practice using fast-rate speech in combination with pre-listening transcript reading enhances processing speed, which is the most important factor in improving L2 listening proficiency. Thus, if transcript reading is performed before L2 listening, even less proficient L2 learners might be able to map the sound to the meaning to comprehend better and enhance their processing speed of fast-rate speech.
Hence, the objectives of this study were to clarify brain mechanisms underlying fast-rate L2 speech listening with pre-listening transcript reading and examine whether this brain mechanism was related to learning efficiency. As such, we investigated if the brain activity involved in fast-rate speech listening with transcript reading could predict an individual's success of future learning during intensive L2 listening training. The following three cognitive processes may play an important role in L2 fast-rate listening combined with pre-listening transcript reading: (a) adjustment to fast-rate speech, (b) sound-meaning mapping, and (c) prior knowledge-sensory input integration. Among the three, prior knowledge-sensory input integration may contribute to speech perception the most, because decoding speech by relying only on bottom-up parsing without integrating top-down prior knowledge might be difficult for L2 learners. In other words, it is sometimes too difficult for L2 learners to adjust to fast-rate speech and mapping sound to meaning without any prior knowledge. According to Friston (2010), the brain always tries to generate internal models by making inferences from sensory input, and perception is the optimization of these internal sensory input models by the brain. If the inference is accurate, the prediction error can be minimized, and perception of sensory input can be successful. Therefore, we hypothesized that cognitive processes involved in prior knowledge-sensory input integration during L2 listening training with transcript reading might determine future learning success. Here, we describe potential candidate brain areas for L2 listening training with transcript reading.
Regarding the mapping of sound to meaning during natural speech comprehension, previous studies have reported that the temporoparietal region, including the angular gyrus (AG), plays a pivotal role in mapping sound to meaning (Ben Shalom & Poeppel, 2008;Kocagoncu, Clarke, Devereux, & Tyler, 2017;Zhuang et al., 2014). A recent study by Kocagoncu et al. (2017) examined the real-time cortical dynamics underlying the rapid transformation of auditory input to meaningful interpretation using magnetoencephalography while participants listened to a series of spoken words. They found that lexical and semantic competitions among words induce brain activity in the form of early transient effects in the auditory sematic brain networks, such as the left supramarginal gyrus (SMG), left superior temporal gyrus (STG), left IFG, and left middle temporal gyrus (MTG). However, these activations shifted and remained only in the left AG and MTG when the participants identified the unique semantics of the target words in the last phase, suggesting that the left AG and MTG are associated with successful integration of sound to target semantic features.
Many studies have also provided evidence that prior knowledge may enhance successful sound-meaning mapping (Klimovich-Gray et al., 2019, Hannemann, Obleser, & Eulitz, 2007Obleser, Wize, Dresner, & Scott, 2007;Sohoglu et al., 2012;Wild, Davis, & Johnsrude, 2012). These studies demonstrated that predictive processing by prior knowledge (top-down) facilitates both auditory perception and integration of sound-semantic information. For example, Klimovich-Gray et al. (2019) evaluated predictive processing in speech perception using "modifiers and nouns" whose strengths of constraint varied (e.g., "lullaby banana" [weak constraint] vs. "peeled banana" [strong constraint]). In the strong constraint condition, when hearing the word "peeled," the next word "banana" can easily be predicted; however, this is not true in the weak constraint condition. By comparing brain activity in these two conditions, they found that the frontal-temporal regions are modulated by predictive coding; the left IFG for top-down control processes; the left middle and posterior MTG for processing semantic information; and the left Heschl's gyrus for bottom-up perceptual analysis of auditory input. Similarly, Sohoglu et al. (2012) examined the effect of prior knowledge using written text as primes and unintelligible degraded speech with conditions of matching and mismatching. They found that the activity was greater in the mismatching condition than in the matching condition in the left STG. In their study, prior knowledge enhanced efficient decoding sensory input and reduced the prediction error, resulting in decreased neural activity in the matching condition.
Furthermore, varying levels of predictability modulate different patterns of brain activation in the frontal-temporal areas Wild, Davis, & Johnsrude, 2012). For example, Obleser et al. (2007) found different brain activation patterns in the left AG and left STS when the participants of their study modulated both acoustic and semantic predictability in the matching and mismatching conditions. Activity in the left AG increased with predictability (matching condition) only under the condition of effortful speech comprehension (partially degraded speech). If the speech was intelligible enough or completely unintelligible (depending on the degraded level of speech), the activity in the left AG did not change, indicating that predictability does not contribute to comprehension in these conditions. Contrastingly, the activity of the bilateral STS increased as the amount of acoustic spectral information of the speech (undegraded level of the speech) increased, regardless of semantic predictability. Furthermore, the authors revealed that the activity of the anterior STS was independent of that of the AG, indicating that the anterior STS is related to bottom-up acoustic processing, while the AG is associated with prior knowledge-sensory input integration. Different patterns of brain activity during speech perception may emerge depending on how necessary prior knowledge is to decode the sensory input.
In regard to the relationship between predictive processing by prior knowledge and learning, Sohoglu and Davis (2016) found that neural processes modulated by prior knowledge contribute to long-term learning. By comparing the brain activity before and after the training using degraded L1 speech, they found that activity in the STG decreased after the training due to minimization of the prediction error. Predictive coding is particularly important for learning new things because when the top-down inferences are certain, the bottom-up processing becomes compatible with this highly certain inference, which contributes to the learning (Kuperberg & Jaeger, 2016).

Neural mechanism underlying successful L2 learning
To the best of our knowledge, few studies have investigated the neural effects of predictive processing by prior knowledge during L2 speech comprehension. Consistent with the aforementioned lines of reasoning, we assume that top-down predictive processing plays an important role in improving L2 listening skills. Many previous neuroimaging studies suggest that successful and less successful learners rely on functionally and structurally different brain areas for various aspects of L2 learning, such as syntax, (Newman-Norlund, Frey, Petitto, & Grafton, 2006;Sakai et al., 2009), words (Sheppard, Wang, & Wong, 2012;Yang, Gates, Molenaar, & Li, 2015), sound learning (Golestani & Pallier, 2007;Golestani, Paus, & Zattore, 2002, Golestani & Zatorre, 2004, and reading (Barbeau et al., 2017;Boukrina & Graves, 2013). Grant, Fang, and Li (2015) demonstrated that patterns of activation in various brain regions changed after 1 year of L2 learning; improved L2 learning proficiency was associated with decreased activation of the IFG and MFG (associated with cognitive control) but increased activation of the MTG (associated with semantic processing) during the processing of language-ambiguous words. Furthermore, the study assessed whether learning success could be predicted before L2 training. Li and Grant (2016) found that successful learners had stronger connectivity between language-related brain areas (i.e., IFG-STG and IFG-IPL) than less successful learners, even before L2 training. On the contrary, there was no network involved in the left inferior parietal lobule (IPL) for less successful learners both before and after the training (Li and Grant, 2016). Using a 12-week L2 immersion program, Barbeau et al. (2017) reported that the activity of the left IPL increased during the L2 reading task, while it remained unchanged during the L1 reading task. Moreover, there was a significant correlation between the activity of the IPL and reading speed after the 12-week L2 immersion program (Barbeau et al., 2017). Furthermore, the authors demonstrated that individual left IPL activity before the immersion program predicted improved reading speed of L2 learning (Barbeau et al., 2017). Thus, we hypothesized that cognitive processing involved in prior knowledgesensory input integration during L2 listening training with transcript reading may determine future learning success.

Aims of the current study
We investigated (a) brain mechanisms underlying the integration of prior knowledge to sensory input using L2 fast-rate speech listening (sensory input) and transcript reading (prior knowledge) and (b) whether brain activity predicts future learning success during subsequent intensive L2 listening training. To achieve these objectives, we measured brain activity when Japanese learners of English performed a fast-rate listening comprehension task during functional magnetic resonance imaging (fMRI) scanning. Before fMRI scanning, a written transcript for only half of the speech was provided in the transcript reading (TR) condition, while it was not provided in the non-transcript reading (NTR) condition. Brain activities during the TR (fast-rate listening with pre-listening transcript reading) and NTR (fast-rate listening with no transcript reading) conditions were compared to extract the neural substrate underlying L2 fast-rate speech listening with prior knowledge (transcript). We also compared the brain activity between L1 and L2 and investigated the interaction between the effect of transcript reading (TR vs. NTR) and type of languages (L2 vs. L1). This objective was formulated because we anticipated that the integration of prior knowledge and sensory input would be enhanced when the participant comprehended effortful L2 speech compared to automatic comprehension of L1 speech (see Obleser et al., 2007). In addition, by adding the L1 condition, we attempted to control for the episodic memory involved in reading activities before fMRI scanning.
Furthermore, we investigated whether the specific activity of integrating prior knowledge and sensory input during L2 fast speech listening can predict individual improvement after intensive L2 listening training. The same participants who performed the fMRI comprehension task attended five successive days of intensive training (2.5 h per day) beginning the day after the fMRI experiment. During the intensive listening training, the participants listened to a series of fast-rate new speech with transcripts each day for 2.5 h. We evaluated the improvement in the participants' listening before (pre) and after (post) intensive training. According to the aforementioned studies of predictive processing in speech perception (e.g., Klimovich-Gray et al., 2019;Obleser et al., 2007;Sohoglu et al., 2012), we hypothesized that the L2_TR condition would induce more activation in the left frontal and temporoparietal regions (IFG, MTG, STG, and AG) that are associated with integrating prior knowledge and sensory input of fast-rate speech than the NTR condition. The degree of brain activity in these areas may be correlated with improvement in L2 listening by intensive L2 listening training.

Study design
The experiments were conducted in two sessions: an fMRI experiment followed by an intensive L2 listening training session, as previously described (Kajiura, 2019). fMRI experiments were conducted on day 1 to evaluate any difference in brain activity between two task types in two languages: L2_TR (fast-rate listening after transcript reading in L2), L1_TR (fast-rate listening after transcript reading in L1), L2_NTR (fast-rate listening without transcript reading in L2), and L1_NTR (fast-rate listening without transcript reading in L1). Then, after the fMRI experiment, the participants engaged in 5 days of English listening training. The fMRI experiment was also used to test whether brain activity before training predicted the effectiveness of the intensive L2 listening training (e.g., if heightened activation in certain brain areas is associated with greater participant gains). The effectiveness was measured as improvement in the Test of English for International Communication® (TOEIC) listening scores through intensive L2 listening training (TOEIC post-test listening scores -TOEIC pre-test listening scores). The study design is depicted in Fig. 1. This study was approved by the Institutional Review Board at the Graduate School of Medicine, Tohoku University, Sendai, Japan. Before the experiment, written informed consent was obtained from each participant.

Participants
Initially, we recruited 41 right-handed Japanese learners (33 men, 8 women) without any history of neurological or psychiatric disease, who were EFL learners (mean age ± standard deviation (SD); 20.82 ± 1.37 years). They were all graduate and undergraduate students in Japan. Before fMRI scanning, participants took the TOEIC listening test to assess their L2 listening proficiency. Their level of English was intermediate (average TOEIC listening test scores were 289.87 ( ± 58.24) out of 495 (a perfect score) at the time of the experiment.
The control group undertook the two versions of TOEIC proficiency tests at the same interval as the training group but without any training, and their scores were compared to the pre-and post-test (the same two versions of the TOEIC proficiency tests undertaken by the control group) scores of the training group to control for test difficulty and repetition effect between the pre-and post-test conditions.

Listening stimuli
Forty short passages, in both English (L2) and Japanese (L1), were used as listening stimuli. These stimuli were recorded by a native speaker who had been instructed to speak as quickly as possible (average words per min for L2 passages was 344.83 ( ± 39.10), and average morae per min for L1 passages was 679.98 ( ± 34.15). The average auditory duration of L2 and L1 passages was 5684.70 ms ( ± 807.59 ms) and 7699.98 ms ( ± 1063.94 ms), respectively. We used the speech analysis software Praat to detect segments of silence in the audio files and excluded the leading and trailing silences. The purpose of using fast-rate speech was to clarify the mechanism underlying comprehension of unintelligible speech (fast-rate speech) with the help of prior knowledge (the information by reading the transcript). Additionally, using extremely fast-rate speech instead of unintelligible noise-vocoded speech (used in a previous study), might contribute to the problem most L2 learners often have, i.e., adjusting to a fast speech rate. The passages were counterbalanced between conditions (with or without transcript reading for both English and Japanese versions), and items were randomly presented within the same language. Although the passages were not counterbalanced between languages, the contents of both L1 and L2 passages were pooled from the Grade 3 Standard Test of English Proficiency and translated into Japanese for L1 passages. Thus, the level of readability was equivalent. The average Flesch reading ease scores were 82.1 for L2 and 88.2 for L2 translations of L1 stimuli. The average scores of Flesch-Kincaid Grade Level, presented as a U.S. grade level, were 4 for L2 stimuli and 3.1 for L2 translations of L1 stimuli. As mentioned before, passages containing relatively easy grammar and high-frequency words (third-grade level of junior high school in Japan) were used. This was done so that the participants could understand the meaning of words without difficulty in terms of vocabulary and grammar when the passages were read. However, when these passages were listened to at high speeds, all participants would have had difficulty in comprehending them without prior knowledge (i.e., pre-listening transcript reading).
The purpose of the comprehension questions was to force the participants to listen to the passages carefully. Since the accuracy rate of the yes/no judgment questions was not very reliable, subjective intelligibility judgement tasks to evaluate their comprehensibility were given before the comprehension questions. The example stimuli are included in the supplementary materials. All stimuli were presented with E-prime software (Schneider, Eschman, & Zuccolotto, 2002).

fMRI tasks
After taking the TOEIC listening test, the whole experimental procedure was carefully explained to each participant. A practice session of fMRI experimental tasks and transcript reading tasks prior to the experimental fMRI tasks was conducted. Additionally, the participants were instructed to join a 5-day intensive L2 listening training.
Practice fMRI tasks were presented by E-prime software outside of the scanner because of time limitations in the scanner and included three trials (passages, an intelligibility screen, and comprehension questions) in L1 and L2. Therefore, the participants understood the content of the task prior to entering the fMRI scanner. After practice, a transcript reading task was conducted outside the scanner in the laboratory. Transcripts in each language for half (20) of the L1 and L2 passages (speech) that were presented in the fMRI were read silently by the participants outside the fMRI scanner in the laboratory on a personal computer. In these transcripts, L2 and L1 passages were randomly presented, and the participants were instructed to read a series of passages as fast as possible. To judge their intelligibility, they were instructed to subjectively use the 1, 2, 3, and 4 keys (1 representing the lowest comprehension level and 4 the highest) on the computer keyboard. The reaction time was measured between the time when the passage ended and when a key was first pressed to judge the passage's intelligibility. The participants were informed that the reaction time was being measured so as to encourage participants to read as quickly as possible. A further reason for conducting the intelligibility judgment was to ensure that the passages presented in the written form were comprehensible to all participants. This allowed for the confirmation that the difficulty of the fMRI listening tasks was exclusively because of the speech speed and not the speech content.
The participants performed a listening task in the fMRI scanner with or without prior transcript reading (outside the scanner) in both L1 and L2 (L1_TR, L2_TR, L1_NTR, and L2_NTR). Each language task had two sessions conducted separately to avoid having to extract code-switching brain activity; two L2 sessions were presented first, followed by two L1 sessions. The material in the TR and NTR conditions and the order of the task sessions were counterbalanced across participants. An eventrelated design was used for the fMRI experiment. The participants listened to one passage as a practice session in the fMRI scanner to become familiar with the scanner environment and to allow for stable noise cancellation.
Immediately after the practice session, the fMRI experimental task M. Kajiura, et al. Brain and Language 212 (2021) 104839 began. Participants heard 40 passages in both English and Japanese, half of which they had read prior to scanning. Each trial started with 100 ms presentation of white fixation crosses (+++) on a black screen, followed by a 100 ms period of silence. The passages were then played, followed by a time-limited (3000 ms) subjective intelligibility judgment using keys as labeled 1, 2, 3, and 4, as explained before. As soon as a button was pressed, a comprehension question (yes/no question) was presented to assess content comprehension. This presentation was also time-limited (5000 ms). The participants were expected to listen to the passages carefully for answering the comprehension questions and intelligibility judgment. We expected to confirm that the accuracy and intelligibility rates were higher in the TR condition than in the NTR condition, regardless of the participant's L2 proficiency. The inter-trial intervals and intervals between the listening passage presentation and comprehensibility judgment varied between 2 s and 10 s. All participants wore noise-cancelling headphones (Optoacoustics Ltd., Israel), which reduced subjective MRI scanning noise and projected auditory stimuli well. The participants spent a maximum of 32 min in the scanner, which included four sessions (two L2 sessions and two L1 sessions) × 20 trials × 20 s per trial, passage listening (7 s in average), intelligibility judgment (3 s), comprehension questions (5 s), and inter-trial intervals (2-12 s). The design of the experiment is summarized in Fig. 2.

fMRI data acquisition and preprocessing
All images were acquired using a 3.0-T Philips Achieva system (Eindhoven, Netherlands). A gradient echo planar imaging sequence was used with the following parameters: echo time 30 ms, flip angle = 85°, slice thickness = 3.00 mm, field of view = 192 mm, and 64 × 64 matrix functional images. In total, 40 slices from the whole brain were acquired every 2.5 s. Additionally, T1-weighted anatomical images (thickness = 1 mm; field of view = 224 mm; repetition time = 1,800 ms) were obtained. The following pre-processing procedures were performed: realignment to adjust for head motion, adjustment for differences in timing across slices, co-registration of the T1 image to the mean image, spatial normalization to the Montreal Neurological Institute (MNI) template, and smoothing in three dimensions using a Gaussian kernel with a full-width at a half-maximum of 8 mm. All pre-processing procedures and statistical analyses were performed using MATLAB (Mathworks, Natick, MA, USA) and Statistical Parametric Mapping software (SPM12, Wellcome Department of Imaging Neuroscience, London, UK).

Intensive L2 listening training
Intensive L2 listening training was conducted to investigate whether the brain activity while listening to speech, transcripts of which had previously been read, predicts the effects of future learning (gains during training). The participants took the TOEIC listening comprehension test, which is used worldwide as a standard English proficiency test produced by the Educational Testing Service. Five sets of practice tests from the TOEIC listening section (Educational Testing Service, 2008a, 2008b were used for the training. In the pre-(before training) and post-test (after training) evaluation, different materials were used to measure the listening proficiency of the participants. One day after the fMRI experiment, the participants started 5 days of listening training involving transcript reading and fast-rate speech listening. On an average, the participants trained for 2.5 h each day. On day 5, they took a test of listening proficiency to check for improvement, if any, from the pre-test proficiency levels. The control group took TOEIC tests twice, same as the training group, without undergoing any intervening training. Their scores were compared with the scores of the training group in which the participants engaged in listening training.
Similar intensive L2 listening training has been used previously and demonstrated to significantly improve the participants' listening proficiency (Kajiura, 2019). This training included 100 multiple choice questions across five TOEIC listening tests, including four types of listening sections (part 1 involved picture-describing questions: the participants had to choose the sentence that best describes the picture; part 2 involved selecting the best response to an auditory question or statement; part 3 involved questions about conversations; and part 4 involved questions about short talks given by a single speaker. The answer choices were in written form for parts 3 and 4). Additionally, the training included transcript reading of the same TOEIC listening test and a second TOEIC listening test (containing the same 100 questions with fast-rate speech; 1.8 times faster than the original speed for parts 1 and 2, 1.3 times faster for parts 3 and 4, based on the results of the pilot study). The reason for the difference in speed between different parts of the test was because of the length of the passage for each question; the length of the passages of part 3 and 4 was much longer than that of part 1 and 2. These procedures were repeated for 5 days, and different materials were provided each day. The participants presented to the laboratory and practiced 500 listening questions, twice in total, over 5 days. A universal serial bus drive containing all of the tasks and instructions for 5 days, divided into five folders for 5 days, with instructions for each day was given to the participants. A log of the results was also collected. Therefore, the participants could engage in the program Fig. 2. Design of the experiment: event sequence used for the fMRI task.

Behavioral data analyses
In fMRI, accuracy rates of comprehension questions and subjective intelligibility rates in the task types [TR and NTR] and languages [L2 and L1] were also developed into a performance index. To evaluate participant performance, a two-way analysis of variance (ANOVA) was conducted with the task types (TR and NTR) and languages (L1 or L2). Furthermore, the training effect between the training group and control (without training) group was compared by two-way mixed-design ANOVA (groups and the test period) using the scores of TOEIC. As the control group, 19 additional students who undertook the TOEIC proficiency tests without training were enrolled to control the test difficulty between the pre-and post-test (two different versions of TOEIC proficiency tests) conditions.

fMRI analyses
SPM 12 implemented in MATLAB (Mathworks, Natick, MA, USA) was used to conduct a conventional two-level analysis for event-related fMRI data. The first 22 scans were discarded as they contained magnetization stabilization scans and practice sessions. Volumes collected from the remaining scans were used for analysis. Two participants were excluded because of excessive motion within the scanner (over 3 mm) and technical problems. In the first-level analysis, hemodynamic response data from individual participants during fast-rate speech listening (from the time of onset to the time of offset) across different conditions were analyzed using a general linear model. Blood-oxygenlevel-dependent (BOLD) MRI signals were high-pass filtered with a cutoff period of 128 s to eliminate low-frequency artifacts. Four regressors forming each task type (L1_TR, L2_TR, L1_NTR, and L2_NTR) were created to model hemodynamic responses. To extract the brain activity while the participants were listening to comprehensible input in L2_TR, we excluded the trials that the participants had considered incomprehensible. First, the contrast images of each task (L1_TR, L2_TR, L1_NTR, and L2_NTR) in each participant were prepared for second level analysis. Next, to extract the TR effect in L2, a repeated measure, flexible 2 × 2 ANOVA with languages (L1/L2) and task types (TR/NTR) was performed for the second-level analysis. Statistical threshold of all reported results was set initially at an uncorrected P < 0.001 at the voxel level and then family-wise error (FWE) corrected to P < 0.05 at the cluster level.
In the second level analysis, first, to examine the main effect of tasks, the TR conditions were compared with the NTR conditions [TR (L1 + L2) vs. NTR (L1 + L2)]. Second, the main effect of languages was tested with the contrast of [L2 (TR + NTR) vs. L1 (TR + NTR)]. Third, to investigate the effect of interaction between task types (TR vs. NTR) and languages (L2 vs. L1), the contrast of [L2 (TR-NTR)-L1(TR-NTR)] with exclusive mask [L1 (NTR-TR)] was tested. Finally, to identify the brain regions involved in the learning effects associated with L2 listening training, we performed a regression analysis on the whole brain and also a region of interest (ROI) analysis in the observed areas as an interaction effect of the TR in L2. In the whole brain regression analysis, we entered the contrast image from the TR condition in L2 resulting from the first-level analysis into a second-level regression analysis using the score gained during L2 listening training (2.5 h × 5 days) as a predictor variable. The slope of the regression line associating the L2 training effect (the score gained during L2 listening training) with brain activation was calculated at the voxel level. For the ROI analysis, the correlation analysis between the score gained and the mean activation in the L2_TR condition of the related cluster regions extracted from the result of the second level flexible factorial design was specifically activated in the L2_TR condition using MarsBaR 0.44 version.

Behavioral transcript effects
In fMRI, accuracy rates of comprehension questions in the task types [TR and NTR] and languages [L2 and L1] were analyzed. A two-way ANOVA showed that there was a significant main effect of both task types (F(1,18) = 18.84, p < 0.001, η2 = 0.511) and languages (F (1,18) = 31.04, p < 0.001, η2 = 0.633) on performance, but no interaction between them. The results revealed that the accuracy rate of the TR conditions was significantly higher than that of the NTR conditions and the accuracy rate of L1 was significantly higher than that of L2 (Fig. S1, in the supplementary section).

Brain imaging results
The participants, except the two who were excluded because of technical problems, were divided into two groups depending on the accuracy and subjective comprehensibility scores of fMRI experiment results since we used extremely fast-rate speech. Due to the high speed of speech and noise of the scanner, some participants could not reach the criteria scores for fast-rate L2 speech comprehension. The accuracy and subjective comprehensibility scores of the lower scoring group were not significantly different between the TR and NTR conditions in L2 (see Tables S1 and S2 for descriptive statistics between groups in the supplementary section). Therefore, further analysis of the brain data was conducted on the higher scoring participants to determine the brain areas involved in listening comprehension (prior knowledgesensory input integration). If we had used the easier (slower) speech for further analysis, even though the accuracy rate of both L2_TR and L2_NTR conditions would have been high, the neural substrate prior knowledge-sensory input integration could not have been extracted. The higher scoring group included 19 students (13 men and 6 women).

The main effect of the task types (TR vs NTR) and languages (L2 vs L1)
The TR and NTR conditions in both languages (L1 & L2) were first compared. Greater activity was observed in the TR condition than in the NTR condition in the left IPL (including the AG), left anterior cingulate cortex, supplementary motor area, left insula, right precuneus, right putamen, and bilateral IFG (Table 1, Fig. 3).
The effect of language was as follows: higher activity was observed in L2 than in L1 in the right putamen, left precuneus, and bilateral cerebellum. In contrast, higher activation of the bilateral MTG and IPL, including the AG, was observed in L1 than in L2 (Table 2, Fig. 4).

Interaction between the task types and languages
We found significant differential activation in the left AG extending to the posterior STG and right putamen between the TR and NTR conditions as well as L2 and L1 by calculating L2(TR-NTR) − L1(TR-NTR) (see Table 3 and Fig. 5).

Behavioral training effects
The training effect between the training and control groups was compared by a two-way mixed-design ANOVA (groups and test period) using the scores of TOEIC. The scores of 19 higher scoring participants out of those accepted for the MRI analysis, were used as the training group data because our motivation was to clarify whether brain activity predicts improvement by intensive L2 listening training. There was a significant main effect of the group (F (1,36) = 11.459, p = 0.001, η2 = 0.242) and test period (F (1,36) = 78.200, p < 0.001, η2 = 0.685) as well as an interaction between the two (F (1,36) = 25.737, p < 0.001, η2 = 0.417). Post hoc analyses revealed a significant performance difference between the pre-and post-test conditions in both the training group (F (1,36) = 62.199, p < 0.001, η2 = 0.776) and control group (F (1,36) = 16.033, p < 0.001, η2 = 0.450) and also between the groups at post-test conditions (F (1,36) = 62.199, p < 0.001, η2 = 0.776) (Table 4, Fig. 6). There was no significant difference in the pre-test scores between the groups (F (1,36) = 0.67, p = 0.417, η2 = 0.002). Although the control group showed a significantly different score between the pre-and post-test conditions because of the effect of test repetition (the pre-and post-test conditions were similar in format although the content was different), significantly greater improvement was found in the training group than in the control group, indicating the effectiveness of the listening training. This is supported by the subjective answers (on a 5-point Likert-type scale) to the post hoc questionnaire item that asked: "Do you think you improved your listening ability through this intensive L2 listening training?" The average answer to this item was 4.512, where a score of 1 indicated no perceived improvement and a score of 5 indicated great improvement.

Predicting training effects from brain activity
No significant region was found in the whole brain regression analysis. The ROI analysis revealed a significant relationship between the brain activity in the specific brain area extracted by the result from the second level flexible factorial design in the L2_TR condition and training effects. A positive significant correlation between the score gained through intensive L2 listening training and the mean activity of the left AG/posterior STG (x, y, z = −51, −55, 20, voxels in cluster = 124) was found (Spearman's ρ (rho) = 0.59, p = 0.005) using signal intensity calculated with MarsBaR 0.44 version (Fig. 7). There   was no significant correlation between the brain activity in the same area and the pre-test score (Spearman's ρ (rho) = −0.01, p = 0.562).
These results indicate that individuals whose left AG/posterior STG was highly activated in the L2_TR condition could achieve a greater benefit from intensive L2 listening training (Fig. 7).

Discussion
The purpose of this study was to explore neural mechanisms related to the integration of prior knowledge and sensory input using fast-rate speech with pre-listening transcript reading and to determine whether the activity of regions related to prior knowledge-sensory input integration predicts future learning success after subsequent intensive L2 listening training. Hence, we compared the brain activity between the types of tasks (TR: fast-rate speech with pre-listening transcript reading vs. NTR: without pre-listening transcript) and languages (L2 vs. L1). Moreover, to identify the brain regions associated with the processes integrating top-down prior knowledge to bottom-up sensory input, we examined greater differential activation between task types in L2 than those of L1 [L2(TR-NTR)-L1(TR-NTR)] because this integration process might be more important for bottom-up decoding in L2 than in L1. Consistent with our hypothesis, we found significantly greater differential activation between the task types in the left AG and posterior STG for L2 fast-rate speech comprehension. Furthermore, activity in these areas in the L2_TR condition were positively correlated with the L2 training-induced score. Our results indicate that individuals with highly active left AG/posterior STG while listening to fast-rate speech after pre-listening transcript reading, i.e., those who try to integrate the prior knowledge (gained by the transcript) into sensory input (fast-rate speech), tend to be more successful at L2 intensive listening training.

Activity for specific processing in TR_L2; integrating prior knowledge and sensory input
Significantly differential activation between the TR and NTR conditions was found in the left AG/STG and right putamen in L2 than those in L1. Although these two specific regions were found to be activated in the L2_TR condition, their functions might differ. First, activation in the left AG/posterior STG found in the specific L2_TR condition may reflect the process of integration between prior knowledge (gained by transcript reading) and sensory input (fast-rate speech). Our finding is in line with the previous studies, which have reported that degree of activation in the left AG and STG for understanding sensory input was affected by availability of prior knowledge (Hannemann et al., 2007;Obleser et al., 2007;Sohoglu et al., 2012;Wild, Davis & Johnsrude, 2012). For example, in their study, Obleser et al. (2007) elucidated detailed functions of the left AG (BA39) and left anterior STS/STG (BA21) during comprehension using different levels of degraded speech. The left AG was sensitive to predictability (intermediately degraded speech: 8-band), while the activity of the left anterior STS/STG increased as the spectral detail of speech increased from 2-band (highly degraded) to 32-band (clear). The result of the current study is consistent with those of the previous studies. The integration of prior knowledge with sensory input was likely processed in the left AG/posterior STG when required; listening to fast-rate speech was comprehensible in L1 but not in L2. Prior knowledge acquired from pre-listening transcript reading was less important in L1 because the participants could automatically decode sensory input, i.e., fast-rate speech, without using prior knowledge. On the contrary, prior knowledge was more important when listening to fast-rate speech in L2 because the speech was incomprehensible without the compensation from prior knowledge. Hence, activity in the AG increased in the L2  Note. P < 0.05 FWE-Corrected at the cluster level.
condition when prior knowledge was available, and cognitive processing for prior knowledge-sensory input integration may play a role in L2 speech comprehension. The left AG is also associated with mapping sound and meaning of lexical items (Kocagoncu et al., 2017;Zhuang et al., 2014). Kocagoncu et al. (2017) reported that the the AG responded when the meaning of target words was identified while participants listened to a series of words, indicating the role of the AG in auditory-semantic integration. The left AG appeared essential as a functional component in speech comprehension, including semantic retrieval and sound and meaning mapping. In the current study, the behavioral results showed that both accuracy rates and subjective comprehensibility judgement of L1 tasks were significantly higher than those of L2 tasks. The pattern of activation in the left AG was also comparable with the behavioral results. This suggests that the left AG was involved in accessing successful semantic representation from sounds in terms of both automatic processing for L1 and predictive processing for L2. Indeed, the left AG is implicated in functions, such as semantic processing (e.g., Price, Peelle, Bonner, Grossman, & Hamilton, 2016), sound-meaning convergence of multimodal inputs (Seghier, 2013), and accessing the mental lexicon to recognize words (Kocagoncu, Clarke, Devereux, & Tyler, 2017). Furthermore, the AG may serve as a semantic hub underlying functions, including sentence comprehension, discourse, semantic and episodic memory retrieval, accessing stored items, both bottom-up and topdown processing, integrating complex information, knowledge retrieval (Ben Shalom & Poeppel, 2008, Binder, Desai, Graves, & Conant, 2009, Price et al., 2016, Seghier, 2013 and ultimately, language learning (Li & Grant, 2016).
We interpreted from our data that a function of the AG is prior knowledge-sensory input integration, and as a result of this integration, it can be involved in the aforementioned functions. More specifically, greater activation in the left AG during the L2_TR condition than during the L2_NTR condition could have been associated with comprehensible speech perception with successful semantic retrieval enhanced by predictive processing.
Second, activation in the right putamen observed during the interaction might be different from the function of the left AG/posterior STG, since this area was less activated in L1 than in L2 conditions. Interpreting these results, activation in the right putamen might be related to learning functions in L2 but not to the acquired automatized skill in L1. We speculate that higher activation in the L2_TR than in the L2_NTR conditions is a result of procedural learning skill of L2 listening training. Our interpretation is supported by findings of the previous neuroimaging studies on reinforcement learning of auditory Results of the activation in the interaction effect between the task types and languages. Brain areas showing greater differential activation between the TR and NTR conditions in L2 than those in L1 [L2(TR-NTR)-L1(TR-NTR)]. L1, first language; L2, second language; TR, transcript reading; NTR, non-transcript reading. AG, angular gyrus; STG, superior temporal gyrus. The means of the parametric estimates in each area under each condition were plotted on the vertical axis. The error bars indicate 95% confidence intervals (CIs). P < 0.05 FWE-Corrected at the cluster level. Fig. 6. Results of the TOEIC scores before (pre) and after (post) L2 listening training. Control: control group, Training: training group, Pre: pre-test, Post: post-test. The error bars show 95% confidence intervals (CIs). L2, second language; CI, confidence interval. information (Golestani, & Zatorre, 2004, Ullman, 2006, Zarate & Zatorre, 2008. Activation in the putamen has been frequently reported in the processing of procedural knowledge, reinforcement learning, forming predictions from memories, greater learning demand, training effects in language learning (Golestani, & Zatorre, 2004, Ullman, 2006, and articulation and motor control of speech (Golestani, & Zatorre, 2004, Zarate & Zatorre, 2008. Since the putamen is also associated with detecting errors and giving feedback (Ullman, 2006), we presume that by comparing speech with the phonological representation created by reading the transcript in the L2_TR condition, learners might detect errors, give feedback on the sound they heard, and adjust their distorted representation, which could then result in learning improvement, thereby inducing activation in the right putamen. Zarate and Zatorre (2008) explained that the recruitment of the putamen is associated with expertise pitch adjustment and correcting for perceived errors during auditory feedback. During pitch-shifting tasks, activation of the bilateral putamen of expert singers is greater than that of novice singers because expert singers have learned to monitor their auditory feedback to produce the notes correctly. They concluded that increased activity in the bilateral putamen in pitch-shifting tasks in expert singers was because of the monitoring of auditory feedback to perceive errors, thereby, regulating voices to the music notes. Friston (2010) explained that the brain is always inferring and predicting sensations to optimize the internal models of sensory input. By detecting errors and giving feedback to optimize the internal representation to approach the speech they heard, successful learning may occur, and the putamen might be responsible for this function (Golestani & Zatorre, 2004, Ullman, 2006, Zarate & Zatorre, 2008.

The relationship between the brain activity associated with prior knowledge-sensory input integration and learning success
The degree of activity in the AG, which was observed as the effect of top-down predictive processing by prior knowledge during L2 fast-rate speech comprehension, can predict future learning success after intensive training. As the activity in the left AG did not correlate with participants' proficiency before training, the effect of the correlation cannot be merely explained by the proficiency at the time of the experiment. Rather, the results indicate that individual patterns of brain activation while performing under the L2_TR condition predicted the learning effect of training, measured by the score gains after short-term, intensive listening training of 2.5 h × 5 days. A positive correlation between the score gains and L2_TR condition was found in the left AG/ posterior STG, indicating that individuals with highly active left AG/ posterior STG in the L2_TR condition tended to be more successful at this training. As shown previously, activation of the left AG/STG is associated with integration of prior knowledge with sensory input , semantic processing, and mapping sound and meaning (Ben Shalom & Poeppel, 2008, Burgaleta, Baus, Diaz, & Sebastian-Galles, 2014, Kocagoncu, Clarke, Devereux, & Tyler, 2017, Zhuang et al., 2014. In addition, multiple reports on language learning have demonstrated higher activation or larger volume of the AG/inferior parietal gyrus in successful learners (Barbeau et al., 2017, Della Rosa et al., 2013, Golestani & Pallier, 2007, Golestani, Paus, & Zatorre, 2002, Mechelli et al., 2004, Yang, Gates, Molenaar, & Li, 2015. Therefore, our result indicates that individuals whose left AG/STG is highly activated in the L2_TR condition may be more successful at the type of intensive L2 listening training employed here. Therefore, if prior knowledge (obtained by transcript reading) was successfully integrated with sensory input (fast-rate speech) and the fast-rate speech was successfully mapped onto their appropriate meanings, these individuals have the potential to be successful learners. From the perspective of predictive coding in L2, to the best of our knowledge, this is the first study to clarify that brain activity related to prior knowledge-sensory input integration predicts individual learning success and top-down inferences with high-certainty, helps bottom-up decoding, and leads to improved learning of L2 listening.

Conclusion: limitations and future studies
Our study had some limitations. First, we only analyzed data from 19 out of 41 participants because the rest of the participants did not have any score differences between the TR and NTR conditions in L2. We assume that the noise of the scanner may have made the listening Fig. 7. Correlation between the mean activity of the left angular/superior temporal gyrus activation and TOEIC score gains from intensive L2 listening training. P < 0.01, Spearman's ρ (rho) correlation. The mean of the parametric estimates in the area under L2_TR condition were plotted on the vertical axis. L2, second language; gain logit, standardized scores of TOEIC test improvements in intensive L2 listening training. task difficult. However, if the listening task had been too easy, the scores and brain activity might have been affected by participants' proficiency. As the purpose of our research was to determine the effect of prior knowledge, not the effect of proficiency, we used extremely fast-rate speech. If the speech had not been fast enough, high proficiency learners would have been able to comprehend without reading the transcript, while low proficiency learners would have not. Therefore, we tried to avoid the influence of proficiency in this study. Further analysis for low proficiency learners will be required to compare with the results of this study. Second, although the stimulus level across both languages should be similar, it is impossible to control the speed of the materials in both languages, therefore, the materials across the two languages may not have exactly matched linguistically. Third, experiments to clarify how individual abilities, such as working memory or attention control, can influence the brain activity could be a topic of interest for future studies. Finally, because we used only prelistening transcript reading, in the future, it should be examined whether different types of prior knowledge can lead to speech comprehension and learning to generalize the effectiveness in L2 acquisition.
Nevertheless, despite these limitations, the current research contributes to the understanding of neural mechanisms underlying an effective method of learning L2 listening. By comparing the brain activity while listening to incomprehensible (challenging or effortful) speech with and without prior knowledge in L1 and L2, we revealed how prior knowledge is used when necessary, to decode sensory input, and how this brain activity contributes to language learning. Our data provide evidence that activity in the left AG/posterior STG is related to prior knowledge-sensory input integration, and this activity can predict a learner's future potential to be successful during subsequent intensive L2 listening training. These results also suggest that if a learner cannot make effective use of prior knowledge, then learning will not be successful. More specifically, for second language learners, even when listening to incomprehensible speech in L2 or when the material is challenging, learning can be successful if the learner makes use of prior knowledge.
Being able to predict the sensory input with the use of prior knowledge contributes not only to comprehension but also to learning. Comprehensible input itself might not contribute to learning, as well as incomprehensible input. However, compensating for incomprehensible input with prior knowledge may lead to effective learning. Accordingly, the integration of top-down inferences with high-certainty (gained by transcript reading) to bottom-up decoding (of fast-rate speech) would certainly lead to effective learning of L2 listening. This study revealed neural mechanisms related to not only an effective method for improving listening skills by enhancing processing speed in L2 speech but also an important feature of learning, which is gaining new information by making use of prior knowledge.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.