ColloCaid: a tool to help academic English writers find the words they need

This short paper summarizes the development of ColloCaid (www. collocaid.uk), a text editor that supports writers with academic English collocations. After a brief introduction, the paper summarizes how the lexicographic database underlying ColloCaid was compiled, how text editor integration was achieved, and results from initial user studies. The paper concludes by outlining future developments.


Introduction
Research has shown that less experienced users of academic English have a limited repertoire of collocations (Frankenberg-Garcia, 2018). Indeed, collocations like REACH+conclusion are among the most frequent look-ups among novice users of written academic English (Yoon, 2016).
There are a number of tools and resources that academic writers can use to search for such idiomatic combinations of words. These include general English dictionaries and more targeted ones like the Longman Collocations Dictionary and Thesaurus (Mayor, 2013) or the Oxford Learner's Dictionary of Academic English (Lea, 2014). Writers familiar with corpora can also consult general English corpora like the BNC and COCA, and corpora of student papers like BAWE (Nesi, 2011) and MICUSP (Romer & Swales, 2010). Other useful tools include SkELL , arguably the easiest to use English corpus available, FlaxLC (Wu, Fitzgerald, Yu, & Witten, 2019), a learner-friendly corpus-based collocation tool, and LEAD (Granger & Paquot, 2015), an academic English dictionary-cumcorpus.
However, writers may not know where or how to look up collocations (Frankenberg-Garcia, 2011), or may simply not realize that their emerging texts could be made more idiomatic (Frankenberg-Garcia, 2014;Laufer, 2011). Moreover, even when writers realize they need help, looking up collocations while writing can be distracting and disruptive (Yoon, 2016).
To address this challenge, we are developing a text editor that assists writers with academic English collocations (Frankenberg-Garcia et al., 2019a). ColloCaid provides writers with collocation suggestions as they write, helping them find idiomatic combinations of words and expand their collocational repertoire. ColloCaid can also be used to revise collocations in existing drafts.

Lexicographic database
The ColloCaid lexicographic database aims to address core collocations used across disciplines in general academic English. As detailed in Frankenberg-Garcia et al. (2019a), it draws on the noun, verb and adjective lemmas that occur in at least two of three well-known academic vocabulary lists: the Academic Keyword List (Paquot, 2010), the Academic Collocation List (Ackermann & Chen, 2013), and the Durrant (2016) subset of the Gardner and Davies (2014) Academic Vocabulary List.
The original selection of lemmas has been revised to (1) disambiguate polysemy (e.g. figure as image, as number and as person); (2) include homographs used in academic contexts (e.g. aim was initially only listed as a noun, but its less frequent verbal lemma was added to avoid the impression that only the noun was idiomatic); (3) discard lemmas that are not collocationally productive (e.g. actual); and (4) add high-frequency interdisciplinary academic lemmas like paper and table, which slipped through initial selection thresholds .
The database was populated with interdisciplinary collocates pertaining to the above lemmas extracted from corpora of expert academic English writing. As detailed in Frankenberg-Garcia et al. (2019a), this was undertaken using Sketch Engine (Kilgarriff et al., 2014), which automatically summarizes the main collocations of a lemma in a corpus. Issues with the extraction have been dealt with using lexicographic judgment on a case by case basis. This included, for example, overruling the classification of regard as a verb, since its primary use in academic texts is preposition-like, in contexts such as decisions regarding safety, or in prepositional phrases like with regard to .
The database was further populated with authentic examples of collocations in use, selected according to typicality, informativity, and intelligibility. Examples were also curated to address language production needs and maximize their potential for data-driven learning, as explained in Frankenberg-Garcia (2014).

Text editor integration
Academic writers from different disciplines have their own preferred operating systems and text editors. In our interdisciplinary research team, for example, papers initiated by the linguists are normally drafted in a Windows environment using Microsoft Word, whereas the computer scientists prefer to use Macs and LaTeX editors. For developing a prototype and testing it with different users, we opted for an online editor that can be accessed from a normal browser compatible with multiple devices and operating systems, without the need to download additional software. TinyMCE (https://www.tiny.cloud/), a widely used open-source editor that looks like any regular editor was selected for this purpose (Figure 2: A).
We adopted a dynamic, data-driven learning approach to the integration of the lexicographic data into the editor. It is data-driven because collocations suggestions are shown rather than explained. It is dynamic because collocations are displayed only when wanted, and in as much detail as desired, via progressive interactive menus (Figure 2: B-E).

Initial user studies
Development versions of ColloCaid have been tested during university writing workshops and seminars in Brazil, France, Poland, and Spain (Frankenberg-Garcia et al., 2019b). Participants (N=122) included novice and expert L2 English writers from a wide range of disciplines. Due to space restrictions, we are only able to present here the scores obtained on the Brooke (2013) System Usability Scale (SUS). The SUS is a standard for measuring the usability of systems (hardware, software, websites, etc.), with the advantage that its results can be compared on the same scale with hundreds of other systems. It comprises ten alternating positive and negative statements about system usability which users rate with a Likert-type scale. As shown in Figure 3, the SUS scores obtained for ColloCaid are between good and excellent (and above the SUS average of around 70), despite known bugs and minor issues with the lexicographic database. Figure 3. Usability scores of ColloCaid v0.1 to v0.3 and interpretation of SUS values (right) according to Bangor, Kortum, and Miller (2009)

Conclusion and future work
Previous studies on academic writing needs and dictionary use have led us to develop a text editor integrated with a large, lexical database of general academic English collocation suggestions, enriched with corpus examples of collocations in use. Our prototype, which draws on the principle of dynamic data-driven learning, has been well received by L2 users of academic English, scoring between good and excellent on the SUS. Future development of ColloCaid includes adjustments to the lexical database (i.e. expanding and proofreading current coverage), experimenting with new ways of visualizing collocations, and further user testing with thinkaloud and diary studies.

6.
Disclaimer: Research-publishing.net does not take any responsibility for the content of the pages written by the authors of this book. The authors have recognised that the work described was not published before, or that it was not under consideration for publication elsewhere. While the information in this book is believed to be true and accurate on the date of its going to press, neither the editorial team nor the publisher can accept any legal responsibility for any errors or omissions. The publisher makes no warranty, expressed or implied, with respect to the material contained herein. While Researchpublishing.net is committed to publishing works of integrity, the words are the authors' alone.
Trademark notice: product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe.
Copyrighted material: every effort has been made by the editorial team to trace copyright holders and to obtain their permission for the use of copyrighted material in this book. In the event of errors or omissions, please notify the publisher of any corrections that will need to be incorporated in future editions of this book.