A Corpus of Writing, Pronunciation, Reading, and Listening by Learners of English as a Foreign Language

In order to develop effective teaching methods and computer-assisted language teaching systems for learners of English as a foreign language who need to study the basic linguistic competences for writing, pronunciation, reading, and listening, it is necessary to first investigate which vocabulary and grammar they have or have not yet learned. Identifying such vocabulary and grammar requires a learner corpus for analyzing the accuracy and fluency of learners’ linguistic competences. However, it is difficult to use previous learner corpora for this purpose because they have not compiled all the types of linguistic data that we need. Therefore, this study aimed to solve this problem by designing and developing a new learner corpus that compiles linguistic data regarding the accuracy and fluency of the four basic linguistic competences of writing, pronunciation, reading, and listening. The reliability and validity of the learner corpus were partially confirmed, and practical application of the learner corpus is reported here as case studies.


Introduction
The globalization of society has increased opportunities to use English even in circumstances of English as a foreign language (EFL).The changing of outside the class has altered the purpose for EFL teaching in Asian countries (Kirkpatrick, 2012;Yamada, 2015).Before this changing appeared, EFL teaching primarily aimed at linguistic competences of reading and writing, that is, written language competences, but didn't concentrate on spoken language competences such as linguistic competences of listening and speaking.However, present EFL teaching aims to develop both spoken and written language competences (Canale & Swain, 1983).
In addition, current EFL teaching further aims to grow EFL learners who use accurately English at the natural speed (fluently).Although the fluency in terms of speed is not crucial in written language as EFL learners re-consider the use of vocabulary and grammar, it is crucial in spoken language because they need to understand English spoken at natural speed, and speak English at natural speed without reconsideration of vocabulary and grammar.
The two aims mentioned above are to be solved by providing EFL learners with basic practices for the use of vocabulary and grammar in listening, speaking, reading, and writing that fit the proficiency of each of EFL learners.In addressing this issue, it is necessary to first investigate EFL learners' basic linguistic competences to identify unlearned vocabulary and grammar for each of the linguistic competences.Recently, such identification of learner's linguistic problems has been made with error analyses through a learner corpus, because corpus-based analysis examines learner's language use.After that, effective teaching methods and computer-assisted language teaching (CALT) systems to improve communicative linguistic competences for spoken and written English can be developed.
analyses for multiple linguistic competences for each learner from the viewpoints of accuracy and fluency.However, most of the previous learner corpora targeted linguistic competences of writing and speaking (Minematsu et al., 2003;Nicholls, 2003;Izumi et al., 2004;Granger et al., 2009;Yasuda et al., 2009;Gilquin, 2010), while a few corpora compiled listening data (Rytting et al. 2014, ten Bosch et al. 2015) or reading data (Meurers et al., 2010;Ott et al., 2012).In addition, all of the previous learner corpora enabled analyses for the accuracy, but not for the fluency.Hence, it was concluded that we need to compile a learner corpus for analyzing the accuracy and fluency of multiple linguistic competences.
Given the present situation of EFL teaching and research on learner corpora, the present study compiled a learner corpus with two purposes: (1) to compile data for close investigation of EFL learners' accuracy and fluency in writing, pronunciation, reading, and listening; and (2) to compile data for the development of basic modules for a CALT system that can choose basic practices appropriate to the target EFL learners' proficiency level.
The proposed learner corpus of this study will contribute to develop automatic evaluation methods of learners' language use in different modes as well as the language use from both accuracy and the fluency, which lead to a better computer-assisted language learning system.The learner corpus will also benefit to analyze the learning process from these perspectives, and the findings of the analyses will contribute to develop automatic evaluation methods of learning materials that suit to the proficiency.

Previous Learner Corpora
This section introduces the principal learner corpora developed in previous research according to the linguistic competences targeted.Table 1 summarizes the previous learner corpora.Listening Accuracy First, we have learner corpora that compiled linguistic data in writing (Nicholls, 2003;Granger et al., 2009;Yasuda et al., 2009, among others).Data contained in these corpora are sentences written by EFL learners and annotated linguistic information such as part-of-speech information, syntactic information, and information on linguistic errors in word choice and compliance with grammatical rules.Hence, these corpora are effective for investigating vocabulary and grammar that EFL learners have not yet learned or find difficult to use.These corpora can be used for error analysis in order to investigate which sentences are accurately or inaccurately used by EFL learners.Among these corpora, the corpus of Yasuda et al. (2009) can also be used to investigate which sentences EFL learners can use fluently because it contains data on EFL learners' level of confidence in the accuracy of a sentence.
Second, we have learner corpora that compiled linguistic data on speaking (Minematsu et al., 2003;Izumi et al., 2004;Gilquin, 2010, among others).These corpora can be further classified into two types: one type recorded the speech sound of EFL learners reading aloud English sentences that were assigned by the researchers (Minematsu et al., 2003), and the other recorded both speech sound and sentences that they generated in conversation (Izumi et al., 2004;Gilquin, 2010).The former can be used to investigate pronunciation errors, and the latter to investigate both pronunciation errors and errors regarding vocabulary and grammar when generating a sentence.
Third, we have learner corpora that compiled linguistic data on reading (Meurers et al., 2010;Ott et al., 2012), although the target learners were learners of German as a foreign language, not EFL learners.Data contained in these corpora are sentences from reading materials and answers for reading comprehension questions written by the learners.These corpora can be used to analyze errors in both reading comprehension and writing.
Lastly, we have learner corpora that compiled linguistic data on listening (Rytting et al., 2014;ten Bosch et al., 2015), although the target learners were learners of Arabic and Dutch as a foreign language, respectively.In these corpora, error tags of dictation were annotated on target words in each sentence.The former can be used to develop a listening error-correction tool, and the latter to investigate problems in listening comprehension and develop reference data for evaluating a computational model of human word recognition.

Data to Be Compiled
Table 2 shows the language use data that satisfied these two purposes.Written sentences (writing), speech sounds (pronunciation), and comprehension rate (in reading and listening) were used to analyze the accuracy of language use (error analysis).Other data, including processing rates (in writing, pronunciation, and reading) and confidence judgments were used to analyze the fluency of language use (fluency analysis).In Table 2, the comprehension rate in reading and listening refers to the percentage of correct answers to comprehension questions for written or spoken text that an EFL learner silently read or listened to.The processing rate in writing, pronunciation, and reading refers to the speed at which an EFL learner wrote, read aloud, and silently read a sentence in terms of the number of words processed per minute (WPM).The confidence judgment in writing refers to an EFL learner's score of their confidence in the accuracy of a sentence they generated as rated on a five-point Likert scale (1: confident, 2: somewhat confident, 3: average, 4: not very confident, or 5: not confident), and the confidence judgment in the other data refers to the EFL learner's score of the difficulty of sentence pronunciation or comprehension as rated on a five-point Likert scale (1: easy, 2: somewhat easy, 3: average, 4: somewhat difficult, or 5: difficult).

Language Use Data Collection Method
To collect language use data, the EFL learners were asked to perform four tasks in the order shown in Table 3.
After the reading task, the EFL learners were given a 60-minute break.In the listening task, EFL learners listened to four audio news clips sentence-by-sentence only once, using a data-collecting tool with headphones.After listening to each sentence, they estimated their confidence in their comprehension of the sentence and selected their confidence judgment on the five-point Likert scale explained in Section 3.1.When finishing a clip, they answered five multiple-choice comprehension questions.Multiple choices for confidence judgments and comprehension questions were shown on the data-collecting tool's computer screen, which also recorded the EFL learners' choices.
The reading task proceeded similarly to the listening task.There were four news articles, which were read silently sentence-by-sentence, and the EFL learners gave their confidence judgments for their comprehension of each sentence on the same five-point scale as above, and answered comprehension questions.
The pronunciation task also proceeded similarly to the reading task except comprehension questions were absent and the task was performed at a different location.Comprehension questions were omitted because the articles were the same as those used in the reading task.EFL learners read sentences aloud using the data-collecting tool with a unidirectional electric-condenser microphone mounted on a stand placed in front of the mouth in a sound-attenuated recording booth (width: 1700mm, depth: 1900mm, and height: 2100mm).Speech sound data were recorded on a solid-state stereo with a sampling rate of 44.1 KHz and a quantization level of 16 bits.
In the writing task, EFL learners were asked to write sentences describing four pictures and to answer 20 questions about their own background and computer skills.The EFL learners were asked to write at least, five sentences for each picture, and at least one sentence for each background question.After writing each sentence, they determined their confidence judgments as above.The data-collecting tool's screen showed the pictures and questions, and recorded the sentences, writing time and choices for confidence judgments.
In these data collection tasks, EFL learners were asked to complete each task as fast as possible during the allotted time, and to stop working either when the task was completed or when the experimenter and data-collecting tool alerted them of the end of the allotted time.EFL learners were prohibited from using dictionaries or any other reference books, and the data-collecting tool did not allow EFL learners to return to review a sentence after moving on to another sentence in the listening, reading and pronunciation tasks, or to revise or modify a sentence in the writing task.

Learners
90 EFL learners who were enrolled either in undergraduate or graduate programs at the university (males: 48 and females: 42) took part in the data collection tasks and were paid money as a reward for their participation.EFL learners at university were chosen because although Japanese students learn English for 6 years in junior and senior high school, low linguistic competences among Japanese students has been recognized as a problem leading universities to offer remedial English classes.The mean age of EFL learners in the present study was 21.5 years [range 19-40 years, standard deviation (SD) 2.6].The EFL learners were asked to submit valid Test of English for International Communication (TOEIC) scores, taken in the current or previous year.The learners confirmed that they had basic computer skills such as typing with a keyboard and controlling a mouse.
The EFL learners were classified into three groups based on their TOEIC scores, each of which consisted of 30 learners.The TOEIC scores in the beginner group ranged from 280 to 485, those in the intermediate group ranged from 490 to 725, and those in the advanced group ranged from 730 to 985.Table 4 shows the descriptive statistics for the EFL learners' TOEIC scores by group.Figure 1 shows the distribution of the EFL learners' TOEIC scores.Comparing the mean TOEIC score of the EFL learners in this study (633.8)with that of EFL learners in all of Japan (583.5) provided by Educational Testing Service showed that the participants in this study had a higher mean score.The difference in the mean scores likely arose from the fact that the EFL learners in this study were limited to undergraduate or graduate students, while the nationwide mean also included junior and senior high school students' scores.Thus, our learner corpus should be regarded as representative data for EFL learners at the university level.

Materials
The purpose of our learner corpus was to provide a base for researching EFL learners' grammatical competence with respect to sentence comprehension and generation.Therefore, completing each task required only basic grammatical competence, not advanced linguistic competence on rhetoric and discourse styles or non-linguistic knowledge such as background or cultural knowledge.
Materials for the listening, reading and pronunciation tasks were chosen by focusing on the use of grammatical competence.News articles were chosen for these tasks because the target EFL learners were undergraduate or graduate students who needed to learn vocabulary and grammar used in authentic materials such as news articles, and because news articles are often used as teaching materials.
The news articles were taken from the two main sections of the Voice of America (VOA) website (http://www.voanews.com):the special section prepared for English learners, and the editorial section prepared for native English speakers.Articles posted in the special section contain short, simple sentences consisting of the 1500 basic vocabulary of VOA, and avoid idiomatic expressions.In contrast, there are no restrictions on vocabulary or sentence construction in articles posted in the editorial section as long as they are appropriate as news articles for native English speakers.
The news articles and clips for the listening task were also taken from the special and editorial sections of the VOA website.In addition to the lexical and syntactic restrictions on special section news articles, news clips are phonetically restricted, and the speed of news clips posted in the special section is two-thirds slower than those in the editorial section, which are read aloud at a native English speaker's natural speed of approximately 250 syllables per minute (Robb &Gillon, 2007).
The same news articles were used as the materials for the reading and pronunciation tasks.This overlap was necessary to allow the EFL learners to focus on pronunciation, and to minimize the influence of the article contents on learners' linguistic performance.
Tables 5 and 6 summarize the properties of the selected materials for the listening, reading and pronunciation tasks.The mean number of words in a sentence shows that sentences from the editorial sections are longer than those from the special section.This is also evident from the length of the shortest and longest sentences.Tables A and B (Appendix) provide examples of news articles for the reading task from the special section and editorial section, respectively.Each news article or clip contained five multiple-choice comprehension questions following the format of Nation and Malarcher (2007): two correct answer options; two incorrect answer options; and an alternative option.An example of the comprehension questions used is provided in Table C (Appendix).
Similar to the selection of materials for the other tasks, two types of materials were chosen in order for EFL learners to complete the writing task by using basic grammatical competence, not advanced linguistic competence on rhetoric and discourse styles or non-linguistic knowledge such as background or cultural knowledge: pictures to describe and questions to answer.Four pictures representing the flow of an event on the street where four people appear were taken from a writing test (Hughes, 2003).These pictures were chosen because no special terminology was required to describe the event (Figure A, Appendix).
The EFL learners were asked 15 questions about their academic background, taken from Ehrman (1996), and five questions about their computer literacy, taken from Eignor et al. (1998) (Table D, Appendix).These questions were chosen because no special terminology was required to answer these questions, and the EFL learners could answer questions about their own background more easily than questions about other topics such as business or academic issues.The questions consisted of thirteen constituent interrogative questions, five polar interrogative questions, and two command questions.

Amount of Language Use Data
Table 7 summarizes the language use data collected in the writing, pronunciation, reading, and listening tasks, which satisfy the first two purposes of this study regarding the types of language use data to be compiled.The writing data collected consisted of more than 4,000 sentences (29,115 words); this is 1.1 times the minimum number of 3,600 sentences to be collected (90 EFL learners × 40 sentences each).On the other hand, the pronunciation, reading, and listening data lacked 7, 18, and 396 data pieces, respectively, from the expected 7,200 data pieces (90 EFL learners × 80 sentences each), respectively.The total length of the speech sound data collected was 28.9 hours.

Reliability
We assessed the reliability of the language use data following a validation procedure for language tests (Brown 1996).Reliability was determined by investigating the extent to which language use data were consistent among EFL learners.This is because reliable data should yield similar results for EFL learners at the same proficiency level.
Reliability and internal consistency were analyzed in terms of Cronbach's alpha reliability coefficient (Cronbach 1970), which was mathematically defined using Equation 1, where α is a reliability coefficient, k is the number of items on language use data (the number of data items, k = 9), S 2 i is the variance associated with item i, and 2 T S is the variance associated with the sum of all k-item values.Cronbach's alpha reliability coefficient, which ranges from 0 (absence of reliability) to 1 (absolute reliability), is empirically judged as satisfactory if it is above 0.8.
A high reliability coefficient (α = 0.89) was observed.This reliability coefficient indicates that the language use data were 89% consistent, or reliable, with 11% random variance.This suggests a high reliability of the collected language use data.

Construct Validity
We assessed the validity of the language use data following a validation procedure for language tests (Brown 1996).Validity was analyzed by construct validity and criterion-related validity.
Construct validity was determined by investigating the degree to which the language use data could classify the EFL learners into a beginner, intermediate, or advanced group based on their level of English proficiency.Whether the language use data could classify the EFL learners into the three proficiency level groups was investigated with analysis of variance (ANOVA) with p ≤ 0.001 representing statistical significance.
Tables 8, 9, and 10 summarize the means and SDs of the language use data for the three proficiency level groups.The mean values were calculated by dividing the sum of the data values for each task by the number of EFL learners per group (n = 30).The one-way ANOVA showed statistically significant differences among the three proficiency level groups in every type of language use data, as shown in Table 11.These results suggested that the construct validity of the language use data is still open to doubt since statistically significant differences were not observed in most of the language use data between the beginner and intermediate groups.On the other hand, the results confirmed the construct validity between the beginner and advanced groups, and between the intermediate and advanced groups.Specifically, EFL learners in the advanced group could read and listen to English sentences more accurately than those in the beginner and intermediate groups, while accuracy in reading and listening comprehension was similar between the beginner and intermediate groups.In addition, the advanced group could read, listen to, write, and pronounce English sentences more fluently than the other two groups, and the intermediate group could read and pronounce English sentences more fluently than the beginner group.However, the writing fluency was similar between the beginner and intermediate groups.

Criterion-related Validity
Criterion The language use data demonstrated marginal criterion-related validity.A statistically significant correlation was observed between the language use data and TOEIC scores; absolute Spearman rank-order correlation coefficients ranged from 0.36 to 0.73 (p < 0.01), as shown in Table 13.These results suggest moderate criterion-related validity of the language use data excluding data for confidence judgments in writing (r = 0.36).This low correlation is likely due to the fact that the EFL learners in the present study could choose the sentences they wanted to write, unlike in the pronunciation, reading, and listening tasks in which they had to use sentences provided by the researchers.Hence, the confidence judgments in writing should be independent from the proficiency of the EFL learners.On the other hand, the processing rate in writing showed a high correlation, which suggests that the processing rate data should be used in order to analyze learners' writing competences.

Usability Evaluation of Language Use Data
This section discusses the usability of the language use data from the viewpoint of providing a research base for the development of basic modules of a CALT system.The language use data collected in the present study (Table 2) can serve as training data for the statistical development of various basic modules of a CALT system.Among the various possibilities, we actually developed two basic modules: statistical methods for measuring the readability or listenability of a sentence (Kotani et al., 2012(Kotani et al., , 2014)), which are reported in this section as case studies.
In English teaching, it is necessary to choose materials with a level of readability or listenability that is appropriate to the target EFL learners' proficiency since inappropriate materials will negatively affect teaching efforts and decrease motivation among learners.The readability or listenability of authentic materials is usually unknown, unlike materials in textbooks prepared specifically for EFL learners.
Using previously developed automatic readability or listenability measuring methods for authentic materials is problematic because EFL learners' proficiency is not taken into account.Unlike native speakers, EFL learners have different proficiencies; hence, it is necessary to take individual differences of EFL learners' proficiency into account when measuring readability or listenability.
To fix this problem, Kotani et al. (2012Kotani et al. ( , 2014) ) developed methods to measure the readability or listenability of a sentence according to EFL learners' proficiency using multiple regression analysis.To account for individual differences, Kotani et al. (2012Kotani et al. ( , 2014) ) considered scores on the difficulty of sentence comprehension among EFL learners to be dependent variables in the multiple regression analysis, and EFL learners' proficiency to be one of the independent variables (the other independent variables were properties representing the linguistic complexity of a sentence).The scores judged by the EFL learners reflected the individual differences in proficiency because they were subjectively judged by the EFL learners themselves, as discussed in Section 3.2, while the EFL learners' proficiency was represented in terms of TOEIC scores, as discussed in Section 3.3.

Conclusion
Motivated by the present situation of EFL teaching and previous research on learner corpora, the present study compiled a learner corpus that could (1) demonstrate the accuracy and fluency of EFL learners' language use to comprehend a sentence when reading and listening, and to generate a sentence in writing and speaking (pronunciation); and (2) serve as a language resource for the statistical development of various basic modules of a CALT system.
What this study implies is that learner corpus study should examine different linguistic skills from different perspectives.Since EFL learners are on the learning process, their linguistic skills have not yet been stable.Thus, an EFL learner may succeed in reading comprehension of a sentence, but fail in fluently reading aloud of the sentence.As shown in Table 12, the beginner and intermediate groups read aloud English sentences with the different processing rates, but no statistically significant difference was observed in their processing rates of writing and reading.If the learner corpus study compiles the language use data in different linguistic skills from different perspectives, the contribution of learner corpus data to the development of CALT system will be enhanced more.
A remaining problem of our learner corpus is that it needs to be extended quantitatively and qualitatively.These improvements are necessary in order to gain more detailed information with respect to EFL learners and their language use.Under our learner corpus, the language use data targeted the four types of language use of the three proficiency levels of EFL learners; in the future, corpus data should further be compiled for analyzing different language uses such as essay writing, speaking, narrative reading, and listening to lectures from EFL learners at different levels of proficiency, along with learning experiences and language aptitudes in order to analyze language use in more detail.2 These actions will help stem the flow of finances to and inhibit the travel of this dangerous operative. 3 The designation of Fahd al-Quso highlights U.S. action against the threat posed to the United States by al-Qaida in the Arabian Peninsula, said U.S. Ambassador for Counterterrorism Daniel Benjamin.12 The terrorist designation blocks all al-Quso's property interests subject to U.S. jurisdiction and prohibits U.S. citizens from engaging in transactions that benefit al-Quso.

13
In addition to the U.S. domestic action, the United Nations Sanctions Committee's listing will require all U.N member states to implement an assets freeze, a travel ban, and an arms embargo against al-Quso.

14
The actions taken against the AQAP operative demonstrate international resolve in eliminating its ability to execute violent attacks and to disrupt, dismantle, and defeat their networks.

4
The joint designation by the United States and the United Nations alerts the public that Fahd al-Quso is actively engaged in terrorism.5Theseactions," said Ambassador Benjamin, "expose and isolate individuals like al-Quso and result in denial of access to the global financial system."6 Prior to the formation of al-Qaida in the Arabian Peninsula, or AQAP, al-Quso was associated with al-Qaida elements in Yemen and involved in the 2002 USS Cole bombing in the Port of Aden, which killed seventeen sailors.7 He was jailed in Yemen in 2002 for his part in the attack.8 Following al-Quso's release from prison in 2007, he joined al-Qaida in Yemen.9 In November 2009, al-Quso was added to the list of the FBI's most wanted terrorists.10 Al-Quso is connected to other designated AQAP senior leaders, including Anwar al-Awlaqi, Nasir al-Wahishi, and Said Ali al-Shiri, and acts as a cell leader in Yemen.11 In May 2010, al-Quso appeared in an al-Qaida in the Arabian Peninsula video in which he threatened to attack the U.S. homeland, as well as U.S. embassies and naval vessels abroad.

Table 1 .
Specifications of previous learner corpora

Table 2 .
Language use and corpus data

Table 3 .
The four tasks used to collect language use data

Table 5 .
Properties of the materials selected for the listening task

Table 7 .
Language use data of the EFL learners

Table 8 .
Comprehension rate (%) of the three proficiency level groups

Table 9 .
Confidence judgments of the three proficiency level groups

Table 11
-related validity was determined by comparing to what extent the language use data correlated with the scores of a well-established English test, TOEIC.According to the TOEIC technical manual (Chauncey Group International 1998), TOEIC scores correlate with those of other tests, such as the Comprehensive Adult Student Assessment System, the Test of English as a Foreign Language, and the Canadian Language Benchmark Assessment, with correlation coefficients ranging from 0.73 to 0.87.In the present study, we set the lowest correlation coefficient (r=0.73) as the target correlation coefficient between the language use data and the EFL learners' TOEIC scores.

Table B .
Example of a news article from the VOA editorial section 1The United States and the United Nations have listed Al-Qaida in the Arabian Peninsula fugitive Fahd al-Quso as a Specially Designated Terrorist.