Write to improve : exploring the impact of an automated feedback tool on Bahraini learners of English

The development of written accuracy among learners of English as a Second Language (ESL) has always been a primary concern for ESL teachers and researchers in Applied Linguistics and Second Language Acquisition (SLA). While a vast body of research has examined written corrective feedback on students’ written products, few studies have focused on the development of written accuracy among Arabic speaking learners of English using automated feedback tools. This case study first examined the level of written accuracy of Bahraini learners of English in their second year at a higher education institute, highlighting the frequency of errors influenced by their first language (Arabic). The course following this first stage included a significant component of automated feedback on students’ writing; and this study explored the impact that the use of these feedback tools had on learners’ writing in English, tracking development over the course of an academic semester. A corpus of students’ initial writings and subsequent revisions was analysed to identify whether there was an improvement in the accuracy of students’ texts; and students’ perceptions were elicited. foundation programme with two core English courses. Students are enrolled in these two courses based on their results in the entry test. Upon completion of the foundation requirements, students are eligible to take degree English courses. Some students’ entry results qualify them to be direct entry students to degree programmes. The degree programmes at Bahrain Polytechnic include four mandatory English language courses. The first two courses are English communication courses which are normally enrolled in in Degree Year 1. The other two courses are English for specific purposes’ courses in which students are enrolled in in Degree Year 2. Not all students get registered on the specific degree years stated for class capacity reasons.


Introduction
Feedback is a crucial part of learning in general, and of language learning in particular. Moreover, one of the most important elements of effective feedback is its timing (Boud & Molloy, 2013;Nicol, 2010) in addition to being understandable, specific and contextualised. The element of timeliness in particular can be challenging in a contemporary higher education environment with high numbers of students and large workloads of teachers. In this context, automated feedback tools offer considerable potential in being able to provide timely feedback at a time and place of a student's choosing as they provide flexibility of time and place. This means that the feedback students seek can be immediate in some cases, thus overcoming the lag time involved in waiting until teachers have time in their busy workloads to provide feedback. This is particularly useful when it comes to student writing, as development of written accuracy in language learning is an iterative process and student writing can improve significantly through multiple feedback cycles (Fernández-Toro & Hurd, 2014;Sheen, 2007). However, automated feedback also has potential drawbacks and challenges, in particular a perception that online feedback can be impersonal (Guardado & Shi, 2007). This depends to some extent on the type of automated feedback tool that is used. In addition, online feedback relies to an important extent on students being self-directed and taking responsibility for making the   English, 2018). This tool offers specific features that help to overcome some of the perceived disadvantages of automated feedback such as the option to provide contextualised information in the form of a teacher's personally designed workbook. Overall, we explore in this paper whether automated feedback tools such as Write & Improve can potentially enhance the written language development of Arabic speaking learners of English.

Literature review
A considerable amount of research has examined the specifics of Arabic speaking learners of English from a variety of angles. Khuwaileh and Al Shoumali (2000), for example, studied correlations between writing skills in Arabic and writing skills in English, and concluded that poor writing in English correlates with similar deficiencies in the mother tongue. Other studies focus on specific elements of written English and how these may be specific to Arabic speaking learners of English (e.g. Sawalmeh, 2013;Mourssi, 2013;Crompton, 2011;Mahmoud, 2005). In addition, a range of studies address the role and practice of written corrective feedback to students' written productions in different linguistic and cultural contexts (e.g. Rummel & Bitchener, 2015;Bitchener, 2008). Written corrective feedback develops language accuracy over time (Ferris, 2016;Evans, Hartshorn & Strong-Krause, 2011). Interestingly, Kang and Han (2015) conducted a meta-analysis of the efficacy of written corrective feedback in improving second language written accuracy. Their study covered 21 primary studies on the topic, and they concluded that the efficacy of written corrective feedback "is mediated by a host of variables, including learner's proficiency, the setting, and the genre of the writing task" (p. 1). This is echoed in Winstone et al.'s (2017) recent study in which they identify a range of variables that impact on what they call the "proactive recipience of feedback" which refers to "a state or activity of engaging actively with feedback processes, thus emphasizing the fundamental contribution and responsibility of the learner" (p. 31). While their study focuses on feedback in a broader sense and is not primarily focused on language correction, it is still interesting in relation to our study because it allows us to zoom in on the efficacy of online feedback especially when considering the learner's proficiency and in particular as it relates to the setting. This is further reinforced by Nicol (2010) who has developed a set of ten principles or characteristics of effective feedback, which is worth outlining in full, as it provides a useful analytical tool to evaluate automated feedback tools such as Write & Improve with. Nicol (2010, pp. 212-213) argues that for written feedback to be effective, it should be: 1. Understandable -expressed in a language that students will understand. 2. Selective -commenting in reasonable detail on two or three things that students can do something about. 3. Specific -pointing to instances in the student's submission where the feedback applies. 4. Timely -provided in time to improve the next assignment. 5. Contextualised -framed with reference to the learning outcomes and/or assessment criteria.
6. Non-judgemental -descriptive rather than evaluative, focused on learning goals, not just performance goals. 7. Balanced -pointing out the positive as well as areas in need of improvement. 8. Forward looking -suggesting how students might improve subsequent assignments. 9. Transferable -focused on processes, skills and self-regulatory processes not just on knowledge content. 10. Personal -referring to what is already known about students and their previous work. Again, with regards to online feedback tools, especially with a focus on academic writing, there is a considerable body of research. Within this body of research, there is a range of literature about automated grading tools (e.g. Ware & Warschauer, 2006), but less about automated systems to provide students with feedback (e.g. Cheng, Law, & Wong, 2016, Czaplewski, 2009, and even then, it is often only linked to grading (Matthews, Janicki, He, & Patterson, 2012). "Systems that generate feedback on written work through sophisticated computer-generated models have been promoted as cost-effective ways of replacing or enhancing direct human input" (Ware & Warschauer, 2006, p. 110). They provide feedback on grammar, content, mechanics and organisation. Automated Writing Evaluation (AWE) or Automated Essay Scoring (AES) software (Chen & Cheng, 2008), which are about providing feedback to students, have been studied in recent years. Stevenson and Phakiti (2014) and Stevenson (2016) have identified that while there is modest evidence that suggests positive effects on the quality of written texts that students produce, there is little evidence as yet that the effects of AWE transfer to more general improvements in writing proficiency. Some of the literature on AWE includes a focus on how it can be used to provide formative writing feedback (e.g. Li, Link, & Hegelheimer, 2015;Wang, Shang, & Briody, 2013;Grimes & Warschauer, 2010), which is the aim of Write & Improve, the tool we focus on in this paper. Arguably one of the most well-known AWE tools is Grammarly (www.grammarly.com), which has a focus on grammar level feedback (Dembsey, 2017). Grammarly offers context-specific grammatical suggestions to the identified sentence-level grammatical errors of students, which need to be edited for a more accurate text (see Figure 1), but is not always as effective as it could be in terms of highlighting inaccurate structures or even suggestions to replace the errors (ibid).
Another AWE tool is Accuplacer (https://accuplacer.collegeboard.org/), which aims to support students and organisations in placing students at the right level based on their existing skills. It evaluates students' writing depending on their idea development, organisation, effective language use, sentence structure, and punctuation (Johnson & Riazi, 2015). Based on students' diagnostic test results, Accuplacer provides a personalised learning path where students get instant feedback to revisit skills they need to acquire (see Figure 2) (ibid). Another tool, MY Access! (https://www.myaccess.com), evaluates and provides feedback on lexical complexity, syntactic variety, discourse structures, grammatical usage, word choice, and content development. (Chen & Cheng, 2008, p. 94) (see Figure 3). Despite the fact that Chen and Cheng's (2008) sample of students did not favour using MY Access!, it did actually improve participants' formal aspects of writing such as word choice and sentence structure. The Criterion online essay evaluator (https://criterion.ets.org/) is another tool that provides instant feedback on organization and development, grammar, mechanics, and style (Attali, 2004). In his large-scale field implementation of Criterion, Attali (2004) concluded that the system helped students revise and edit their subsequent drafts and this reduced their errors at the end of the intervention. Despite the development of these different tools, there is overall a lack of freely available AWE software, and commercial options are often not tailored to the needs of learners of English from specific linguistic and cultural backgrounds (Jordan & Snyder, 2012;Dörnyei, 2005

Methodology
The feedback tool The Cambridge English Write & Improve interface aims to make students better writers. It is a userfriendly website where students can sign in easily through a teacher's personally designed workbook, which is a key feature that allows for the tailoring of the tool to specific contexts. Students can also access a variety of well-constructed topics at three language levels: beginner, intermediate and advanced (see Figure 4). Furthermore, it offers an International English Language Testing System (IELTS) Workbook with a variety of writing topics for testing purposes. Once students choose the workbook or the writing topic of their preference, they are transferred into a writing page similar to that of a word processor to type their writing using basic tools (see Figure 5).Students can track their word count as they type and finally click on 'check' to get their writing checked within seconds. The submitted written text comes in a shaded format suggesting color-coded changes with a few symbols indicating the type of error as shown in Figure 6. The interface works as a motivating teacher awaiting students' actions to revise their drafts and edit them. It thus assumes, as well as attempts to stimulate, some degree of 'proactive recipience of feedback' (Winstone et al., 2017). Students can then resubmit as many times as they want and on each attempt they get to see their level on the Common European Framework of Reference for Languages (CEFR) scale. The CEFR shows students' language ability on a six-point scale, from A1 (Beginner) to C2 (Mastery). The CEFR is intended to stand as a central point of reference, itself always open to amendment and further development, in an interactive international system of co-operating institutions, whose cumulative experience and expertise produces a solid structure of knowledge, understanding and practice shared by all. (Trim, cited in University of Cambridge, 2011) The scale is used to create a graph illustrating the progression for students' revisions and resubmissions. Wali The interface also provides a 'Class View' for teachers to monitor students' development on an individual basis as well as a class progress graph. Figure 7, for example, shows the whole class's progress with each line indicating a student's performance). Teachers can view the type of changes each student makes and the errors they tend to ignore. The class view feature also shows the score range for each student on the CEFR levels ( Figure 8). The individualised student progress is shown in Figure 8, with the students' names (redacted) listed on the left and their progress on the linear scale.

Subjects
The study reported in this paper involves 53 Bahraini undergraduate business students in the third year of their degree programme 1 . The participants' first language was Arabic and their English proficiency level was assessed by the institution as B1 on the Common European Framework (CEFR). Students were enrolled in a compulsory English course as part of their degree programme by the end of which they were expected to reach CEFR B2.

Data collection
The course ran 5 hours per week for 16 weeks. The participants were asked to write three e-journals in three five-week blocks over the semester. Each e-journal was expected to consist of two paragraphs of 300-350 words each reflecting on their experience of the teaching block. Each block addressed a number of skills and key language elements that students were expected to learn by the end of the semester, as outlined in Table 1. The e-journals followed the Claim, Evidence, Reasoning (CER) and Action format (CER). The CER format is used with the aim of organising students' ideas before writing. Peacock (n.d.) illustrates this format in Table 2 below.
Each e-journal was a piece of cumulative writing over 5 weeks, which was submitted online as an uncontrolled assessment, which has no testing administration associated with it and thus can be submitted outside class time. Based on each week's theme, students were expected to identify and describe their weaknesses and/or the difficulties they faced and the actions they would take to address them in the future. The e-journals were assessed and compared based on a number of criteria included in the rubric: task fulfilment, CERA format, coherence, cohesion, vocabulary range/accuracy, grammar range/accuracy, punctuation, spelling, capitalization, conventions and format. The participants followed the process-oriented writing method whereby each could write and edit the draft using the Write & Improve interface to enhance the quality of the writing in terms of grammar, Wali vocabulary and content as it gives suggestions about vocabulary contexts related to students' writings. The Class View option in the interface was employed to monitor students' written modifications and CEFR levels for their drafts. Students were expected to redraft and finalise their e-journals based on the Write & Improve feedback and to critique and appraise them then finally to submit them with the appropriate academic tone/use of vocabulary. Moreover, a 4-point Likert post-questionnaire was distributed at the end of the semester to get students' perspectives on the usefulness of getting online feedback and suggesting features that would be useful to add in the interface for students. As required by the institution's ethics committee, informed consent was sought from participants in the study and anonymity was guaranteed with regards to sharing students' writing and perceptions. Question: (This is the question provided in the task.) Claim: (Often you can use part of the question to formulate your claim. In an extended response, this will be your topic or thesis sentence.) Evidence: (This is data gathered from text or graphics that help you answer the question provided in the task. Choose a quote or other evidence that directly supports your claim. If you use a quote, then be sure to credit the quote properly.) Reasoning: (This is the most important part of your answer. It provides your reader with the explanation for your claim, and it explains how your evidence supports your claim. This is also where you should draw on key ideas and concepts from the discipline to tie your evidence to your claim.) The evidence shows: I know (relevant disciplinary ideas -i.e., scientific facts and concepts that help answer the question): I can apply (relevant crosscutting concepts -i.e., big ideas that connect the concepts and evidence): Therefore, I can conclude that:

Findings
As the present study tried to examine the usefulness of online writing feedback for Bahraini students' writing development, a teacher-class view was used via the interface to monitor students' edits and CEFR level upon each modification of their drafts. The study explored the effects of automatic online feedback for Bahraini students. Students' progress on their CEFR level was tracked to identify the type of grammatical errors they made and changes they could address in terms of language accuracy, content and punctuation, as well as to provide an overall view of their CEFR level progress or decline.
Students' CEFR levels were identified in their first attempt at writing and using the online tool. It was found that the majority of students (39%) were at CEFR B1, which was their expected level for the course. Another 37% of students were at the lower CEFR A2 level and 15% at A1. Only 7% were at CEFR B2, which is the targeted exit level for the course (Figure 9). Students' levels were checked at the end of the intervention, as illustrated in Figure 10. The majority of students (56%) reached their targeted B2 level while 37% remained at the official B1 entry level, indicating that the findings cannot conclusively show that the changes in writing and language skills observed are attributable to using the Write & Improve feedback tool. This is because some of the students continued to produce recurrent errors while they could produce the same structures with no errors on other occasions. This might show that some degree of learning has occurred, but more practice is required for students to produce error-free structures. .  The questionnaire received 26 responses (61.5% females and 38.5% males), which equates to around half of the participants. The majority of the respondents (57.7%) were between 20 to 25 years old, 38.5% were 19 years old and only a few (3.8%) were 18 years old. The participants' responses evaluating their own writing skills showed that the majority (53.8%) rated it to be good, while 38.5% stated their writing skills were acceptable. Very few respondents thought their skills were excellent (3.8%) or poor (3.8%). As can be expected, the responses are to some extent about impression management, and respondents may thus indicate writing skills levels they think they should have or that would be deemed acceptable at the level they are at, which does not necessarily reflect the reality. This is one of the limitations of self-reporting (Larsen-Freeman & Long, 2014).
The questionnaire also investigated how students found the codes and other feedback in the interface -see Figure 11 for a sample.  Around 61.5% of the respondents found the Write & Improve explanations or codes a bit difficult, while 34.6% believed they were easy to understand. Only 3.8% thought the codes were very difficult. In terms of the usefulness of the tool, Figure 12 below shows that most of the respondents responded that the tool was useful (69.2% useful and 26.9% very useful), which is important, as it may have a flow-on effect for future use for the students and the teacher.
Despite the fact that most students found the interface useful, less than half of the students thought that the online automatic feedback was better than teacher feedback (50% disagreed and 3.8% strongly disagreed -see Figure 13). This is an important and relatively common finding, but it should be approached with some caution, as it often relates to perceptions about automated versus teacher feedback, which does not necessarily correlate with subsequent improvements in students' writing, as in Chen and Cheng's (2008) earlier mentioned example of MY Access! (see also Wilson & Andrada, 2016).
The rest of the respondents (46.2%) preferred the online feedback to the teacher's feedback, which is a surprisingly high percentage and may be due to the earlier mentioned importance that students often assign to timeliness of feedback (Nicol, 2010). Nevertheless, as indicated in Figure 14 below, the majority of students (57.7% agreed and 23.1% strongly agreed) still expressed their interest in getting the teacher's feedback on their writing instead of the tool's feedback, which reinforces the findings in Chen and Cheng's (2008) study on MY Access!. .  The respondents' perceptions about what the tool helped them improve were also sought in terms of grammar, content and vocabulary (see Figure 15, Figure 16 and Figure 17).  The students' responses showed that they believed the Write & Improve interface improved their grammar the most, with two thirds of the students agreeing to this (65.4% agreed and 23.1% strongly agreed). Moreover, the second most impacted writing area was content, with 57.7% (15 students) agreeing that the tool helped improve their content and ideas and (5 students) 19.2% strongly agreeing. Vocabulary ranked third as an improved area in writing with half of the student agreeing, as well as (4 students) 15.4% strongly agreeing, that their vocabulary was improved by using the tool. Overall then, even if students voiced a preference for tutor-provided feedback, a large majority still found the automated feedback very useful and they still felt it improved their writing, especially when asked about specific elements such as grammar. Again, this aligns to some extent with Chen and Cheng's (2008)

study on MY Access!
Students were also asked about the usefulness of knowing their CEFR level while editing. The majority of the respondents found it beneficial to know their level on every edit they made ((53.8% agreed and 34.6% strongly agreed -see Figure 18). This is not entirely surprising as it relates strongly to Nicol's (2010) 'forward looking' (how students might improve subsequent assignments) and 'transferability' criteria for effective feedback. The questionnaire ended with an open question requesting students' general suggestions on the interface to assist them in developing their writing skills. The comments received were clustered in three areas (further pedagogical options and suggestions, combining teacher feedback with online feedback, and technical improvements) as outlined in Another area suggested by some of the students was combining both peer and teacher feedback. The class view option in the interface was added and the teacher was given access at the end of the study. Students shared the comments they received through the website with the teacher to get further explanations and clarifications on certain forms. In roaming around the class and offering teacher feedback to students, the teacher could not respond to every single request for clarifications. This might have led some students to suggest granting the teacher access to their writing.
The last type of suggestion given by students was related to technical improvements to the interface in terms of increasing the word count, adding a new option for new writings, and having a better highlighting system rather than two colors only.

Discussion
The findings suggest that the participants in this study perceived various advantages to an online feedback tool such as Write & Improve, albeit with some significant caveats. As a general first comment, it is useful to consider Roscoe et al.'s (2017) finding that student's perceptions of AWE tools seemed to have minimal impact on their 'in the moment' use of the software to write and revise successfully. However, their perceptions did predict their future intentions of whether to use the software again. . In terms of the participants' responses in this study, it is important to recognise upfront that more than half perceived their writing skills to be good. This is likely to have affected their responses to the feedback they received, and therefore to the feedback tool itself in this case, particularly if the feedback suggested that their writing was not as good as they perceived it to be. Specifically, Nicol's (2010) principle of 'non-judgemental' feedback comes into view here, if the automated feedback is focused on performance rather than learning goals.
With regard to the feedback they received, well over half of the participants found the explanations and codes used as part of the automated feedback difficult. This is potentially problematic, because it takes a while to rectify or adjust the explanations and code used in a tool like Write & Improve. Thus, while the feedback itself is timely once it is in the system, responding to students' use of the tool is often considerably less agile, and it takes a good evaluation mechanism to firstly identify the issue and then act on it. 'Understandable' is Nicol's (2010) first principle of effective feedback, which provides a clear indication of its importance. It is therefore important to build various ways into the tool of adjusting the level at which the explanations are being pitched.
At the same time however, a clear majority of the participants perceived the tool to be useful, although this positive response is tempered to some extent by the fact that more than half of them preferred the teacher's feedback if they had a choice. This may be due to cultural reasons, in that building rapport and trust with teachers is a very important part of the learning process in a Bahraini context, and it fits closely with Laurillard's (2002) earlier mentioned notions of adaptability, discursiveness, interactivity and reflectiveness. This discursive element to feedback is generally easier to satisfy in a face-to-face context and/or in a synchronous online environment where there is a teacher presence. By extension, it is more difficult to achieve as part of an automated feedback tool. Thus, this applies to Nicol's (2010) principles of 'non-judgemental', 'forward looking', 'contextualised', and in particular 'personal'. On the face of it, the principle of 'non-judgemental' appears to be a contradiction in this context, as automated feedback would appear to be much better at being 'nonjudgemental'. However, this is used here in the context of Nicol's (2010) understanding of 'nonjudgemental' as focusing on learning goals in the feedback given, and not just on performance goals. Teacher presence better allows for a focus on learning goals in the way feedback is framed and personalised.
Interestingly, the most useful aspect of the Write & Improve tool in terms of helping participants improve their writing was perceived to be grammar. Grammar is arguably the most 'technical' aspect of writing, and in that sense it is not entirely surprising, as it may require the least amount of 'discursiveness' in the feedback process. In other words, grammar is probably the most rule-based aspect of writing, and an automated feedback tool is therefore suitable as it allows for clear feedback about whether the rules are being followed or not. A similar argument can be made for the feedback tool providing them with information on their CEFR level. The fact that more than half the participants felt that the tool had helped them improve the content of their writing is rather more surprising, as the content would presumably benefit more from a discursive feedback process.
When considering the open-ended responses, some interesting patterns emerged. Somewhat unsurprisingly, when asked about desired improvements some participants asked for more detailed feedback, and in particular they wanted examples and options about how to correct their mistakes, which is exemplified by the request for 'the perfect replacement for the words or grammar'. This is a common student request when it comes to feedback; however, it does not align with Winstone et al.'s (2017) earlier mentioned idea of "proactive recipience of feedback" (p. 31), which has a clearer focus on learning and is geared towards students taking responsibility for their own learning, rather than sitting back and waiting for the teacher (or in this case the automated feedback) to 'solve' their writing for them. One of the most interesting themes to come out of the open-ended responses was the request for a combination of online and teacher feedback. This highlights the potential tension Wali between Nicol's (2010) different principles of effective feedback, and suggests that a complementary approach may work best, at least in a Bahraini context. For example, when one participant says that they "need a tutor because I need a direct answer [about] where I did go wrong", they appear to be referring to the importance of feedback being timely. However, in a more subtle way, this could be interpreted as referring to the importance of discursiveness, as the kind of feedback this participant is referring to requires a dialogue in real time so clarifications can be sought and provided, and this is especially important for those students who do not feel confident about their English skills like this participant. At the same time however, this participant did feel that the automated feedback was very helpful, as it was perceived as having helped to improve their CEFR level.
The comments related to technical improvements are probably the easiest to address, in that the system can be adjusted in response to specific requests, for example the suggestion by one participant to increase the word count to allow for longer essays or reports. Again however, there is a request for "examples of what to write instead of highlighted text", which again implies a reliance on teacherprovided (albeit in electronic form) answers, rather than self-directed "proactive recipience of feedback" (Winstone et al., 2017). Overall then, this recurring theme shows the importance of clarifying expectations in the feedback process. In other words, students need to be carefully prepared for the process if an online tool such as Write & Improve is to be leveraged for its full potential.

Conclusion
In this paper we have presented a case study that evaluates the use of an automated feedback tool called Write & Improve with Bahraini learners of English. In particular, the case study explored how Bahraini students perceived the usefulness of this feedback tool in improving their English writing skills. The responses were interesting when analysed in relation to Nicol's (2010) ten principles of effective feedback, as they showed up some contradictions that relate to learning approaches and objectives. For example, effective feedback needs to be timely, which is one of the major strengths of automated feedback, as it is almost immediate. However, effective feedback also needs to be 'nonjudgemental', 'contextualised' and 'personal', which is much more difficult to achieve, as it needs a level of teacher presence; as noted above, the 'non-judgemental' element may seem contradictory here, but not if we focus on learning goals (rather than just performance goals) as a key element of feedback. Participant responses in this study showed that students struggle with these contradictions to varying degrees. One suggestion would be to combine automated feedback with teacher-provided feedback, so that all bases are potentially covered. The extent to which each is applied would need to depend on each specific learning context, which means there are no hard rules around this. For example, factors such as teacher-student relationships in particular learning contexts, such as Bahrain in this case, would need to have an impact on particular combination of feedback tools.
The case study is relatively small, so these are preliminary conclusions. Future research could test some of these ideas on a larger and cross-institutional scale, or in different cultural contexts. Furthermore, future research could explore different modes of automated feedback delivery, such as 'gamified' feedback for example. These are worth exploring as providing effective feedback is potentially one of the most powerful ways of teaching, and therefore of promoting learning.