Explainable eBook recommendation for extensive reading in K-12 EFL learning

An automatic recommendation system for learning materials in e-learning addresses the challenge of selecting appropriate materials amid information overload and varying self-directed learning (SDL) skills. Such systems can enhance learning by providing personalized recommendations. In Extensive Reading (ER) for English as a Foreign Language (EFL), recommending materials is crucial due to the paradox that learners with low SDL skills struggle to select suitable ER resources, despite ER’s potential to improve SDL. Additionally, determining the dif ficulty level of ER materials and assessing learners’ progress remains challenging. The system must also explain its recommendations to foster motivation and trust. This study proposes a mechanism to estimate the difficulty of ER materials, adapted to learner preferences, using information retrieval techniques, and an explainable recommendation system for English materials. An experiment was conducted with 240 Japanese junior high school students in an ER program to assess the accuracy of difficulty estimation and identify learner characteristics receptive to the recommendations. While the recommendations did not significantly impact learners’ English skills or motivation, they were positively received. A strong relationship was found between the use and acceptance of recommendations and learners ’ motivation. The study suggests that although the system did not increase overall motivation, it has potential to further enhance the motivation of naturally motivated learners.


Introduction
In the context of foreign language learning, Extensive Reading (ER) is a learning method in which students read many texts spontaneously to acquire information or for pure enjoyment (Day et al., 1998).In ER, it is essential for learners to choose materials that suit them and to read at their paces, and it is recognized that these are effective in helping learners acquire language.Indeed, ER effectively promotes foreign language skill development, especially in improving reading comprehension and vocabulary growth (Schmitt, 2008;Urquhart & Weir, 2014).The characteristic of ER, in which students are free to read books of their choosing at their own pace, is also closely related to learners' Self-directed Learning (SDL) strategies.For example, previous studies have shown that ER can influence the process of learners becoming self-directed learners (Enisa et al., 2013;Ningsih, 2019;Takahashi & Umino, 2020).Therefore, ER, like SDL, is closely related to attributes such as motivation and autonomy of individual learners.
However, the paradox lies in the difficulty of learners with low SDL skills to select appropriate ER books, even though ER is critical to enhancing these skills.This can be a problem, especially with online learning.Regarding e-learning, the number of new digital learning repositories continues to grow, and learners point to excessive online learning resources (Ochoa & Duval, 2009).In e-learning, therefore, it is vital to have an inclusive learning environment that allows various learners to select the materials appropriately.
Such difficulties in selecting appropriate materials/exercises commonly appear in any domain.To overcome this problem in general, several studies have been conducted on recommendation systems to automatically recommend the most appropriate learning materials for individual learners.However, several problems to be solved can be considered in the context of eBook recommendations in ER.First, it is challenging to make a good estimate of the learner's ability and recommend the right material.ER cannot apply recommendation algorithms, including estimation of the learner's ability and item's difficulty based on learners' response histories, such as Bayesian Knowledge Tracing (BKT) (Corbett & Anderson, 1994) and Item Response Theory (IRT) (Hambleton & Swaminathan, 2013).This is because it cannot define the "correct answer" in only reading activities unlike in general quizzes, and it is difficult to define what and to what extent students have learned with reading materials.It is also difficult to unambiguously determine the difficulty level of the material in learning English.The difficulty level of reading materials is determined by a combination of factors, including the English vocabulary comprising the materials (Kasim & Raisha, 2017;Qarqez & Ab Rashid, 2017), English grammar items, and the difficulty of the content itself (Satriani, 2018).Therefore, in the ER context, it is difficult to make recommendation algorithms with both estimations of the learner's ability and the item's difficulty, like existing research.Second, it is difficult to realize appropriate recommendations that are acceptable to learners.Recent research revealed that if the recommendations are black-boxed, the learner who does not trust will disagree (Abdi et al., 2020).
Due to the two above problems in the context of eBook recommendations in ER, this study addresses two key challenges.Firstly, we propose an algorithm for estimating the difficulty level of a book by aggregating the difficulty levels of the words within it.
Secondly, assuming that learners' preferences reflect the difficulty levels of books previously read by learners, we realize appropriate recommendations in which the learner's characteristics match the difficulty of eBook.By adopting these two solutions, we propose an explainable English learning material recommendation system for ER in EFL.It can estimate learners' preferences from their ER activities and recommend materials that match their estimated preferences.In addition, we also present a mechanism for estimating the difficulty level of learning materials for ER for the realization of this system.The recommendation platform includes an eBook reader system and a vocabulary profile.
Learners' usage logs of eBooks for learning materials and vocabulary difficulty levels are processed using TF-IDF, an information retrieval technology, to estimate the difficulty level of the materials and the learner's preference for the difficulty level of materials.Using the estimated difficulty level and preference, the system recommends learning materials of the appropriate difficulty level.The system can respond to chronological changes in the learner's preference by processing the learner's learning logs in real-time.In addition, the system can generate explanations that provide the basis for the recommended materials based on the proximity of their difficulty level to the learner's preference.This explanation can provide additional information to make the recommendation more persuasive.This recommendation is intended to be a persuasive recommendation that adapts to the learner's preference and encourages spontaneous learning by the learner.However, as indicated above, this requires an accurate estimation of the difficulty level of the material.
In addition, for this ER material recommendation to be accepted by more learners and to support learning effectively, it is necessary to understand the characteristics of the learners who supported it.Therefore, we pose the following two research questions corresponding to the above two defined challenges:

RQ1:
To what extent does the proposed mechanism for estimating the difficulty level of English materials for ER estimate the difficulty level of the materials?RQ2: What characteristics of learners supported the recommendation of English language materials for ER that could be explained?

Extensive reading
Extensive Reading (ER) is a foreign language learning method in which learners read as many books as possible without focusing on unfamiliar words or phrases.Tanaka and Stapleton (2007) conducted a semi-extensive reading program at a Japanese junior high school where EFL reading is inadequate and found that ER significantly increased exposure to reading input.They concluded that ER had positive effects on English language learners.Previous studies on ER have included print (Mason & Krashen, 1997;Mermelstein, 2015;Pitts et al., 1989;Tanaka & Stapleton, 2007) and online materials (Chen et al., 2013;Sun, 2003).Although integrating online materials into ER is less representative than traditional print text media (Chen et al., 2013), ER programs have proven to have similar positive effects on learners regardless of the type of medium.ER has also been shown to improve reading speed (Tanaka & Stapleton, 2007) and reading comprehension (Chen et al., 2013;Mason & Krashen, 1997;Tanaka & Stapleton, 2007), as well as writing (Mason & Krashen, 1997;Mermelstein, 2015) and vocabulary (Pitts et al., 1989).
ER is closely related to learners' SDL strategies because of the methodological feature of allowing learners to read freely and at their own pace.For example, ER has been shown to influence how learners become self-directed (Enisa et al., 2013;Ningsih, 2019;Takahashi & Umino, 2020).However, while ER is necessary to improve SDL skills, selecting appropriate ER materials for learners with low SDL skills is difficult.This is particularly problematic in the context of e-learning.

Personalized recommender systems in e-learning
In contrast to the general recommendation system, which assumes the existence of an accurate answer to a recommendation (Bahrani et al., 2024), personalized recommendations in e-learning systems require knowledge of the learner and the learning material (Shishehchi et al., 2011).While most state-of-the-art recommender systems in education/learning support uses learners' needs, learning styles, or preferences (e.g., Chen et al., 2024), such recommendation should consider the learning outcomes resulting from the recommendation (Sikka et al., 2012).The learner and material knowledge need to be modeled so that computers can handle the recommendation that can bring about appropriate learning effects.The task of modeling knowledge about learners can be rephrased as modeling learner features needed for personalized recommendations.Learner preferences (Bourkoukou & El Bachari, 2018;Hsu, 2008) and learning styles (Klašnja-Milićević et al., 2011;Truong, 2016) are examples of characteristics to be modeled.For example, an attempt was made to complement the recommender system with existing open learner models (Abdi et al., 2020).
Modeling the knowledge to be learned is another crucial step for personalized recommendations.The attribute-based material recommendation system developed by Salehi and Kmalabadi (2012) uses matrix preferences that reflect the attributes of the material and the learner's access log.Vocabulary recommendation by Zou and Xie (2018) also models lexical knowledge by using lexical sets to represent the difficulty of each word.
Several studies use ontologies to model domain knowledge relevant to recommendations (George & Lal, 2019;Tarus et al., 2018).An ontology is an explicit knowledge representation format that shows knowledge items and their relationships in a domain; Flanagan et al. (2019) developed a lexical knowledge map, reflecting the words in coursework materials and the semantic context in which they are used.This knowledge model has been used in EFL vocabulary recommendation for refugees (Abou-Khalil et al., 2021) and in an English picture book recommendation system (Takii et al., 2021).
However, it is difficult to automatically infer the learner's state of knowledge because of the difficulty in determining the difficulty level of materials and defining knowledge acquired through specific learning behaviors.Zou and Xie (2018) developed an explicit learner profiling model for personalized vocabulary recommendation, where learners' vocabulary size and proficiency level were modeled.It allows learners to adjust the recommended vocabulary level but not automatically infer it.Therefore, this research aims to provide recommendations tailored to learners' preferences regarding the difficulty of the material rather than estimating learners' knowledge states.
The EFL material recommendation in this study uses a content-based method that uses information retrieval techniques to automatically infer the difficulty level of the material and the learner's preference for it.This allows the recommendation system to simultaneously model the learner's knowledge and material knowledge.This method is also helpful for explanation generation, which is discussed in the following subsection.

Generation of explanations for recommendations in e-learning
Previous research has shown that intelligent tutoring systems with prompting and feedback mechanisms can increase students' motivation in self-regulated learning and lead to higher achievement (Duffy & Azevedo, 2015).Such systems with transparent mechanisms can be persuasive and trustworthy to learners, motivating them and leading to effective learning.
A study by Ooge et al. (2022) on explaining exercise recommendations showed the need for transparency and trust in the recommendation system.Flanagan et al. (2021) proposed a system called EXAIT (Educational eXplainable Artificial Intelligent Tools), a system of e-learning systems that can help students learn effectively-proposed to address the issue of learner trust and motivation behind recommendations made by e-learning systems.Their recent work developed an educational mathematics exercise recommendation system using the Bayesian Knowledge Tracing algorithm (Takami et al., 2023) and knowledge concepts extracted from textbooks (Dai et al., 2022).
In ER, closely related to self-directed learning, the system must motivate the learner to perform better.Therefore, material recommendation systems used in ER should provide additional information on their recommendation and be transparent, explainable, and trusted by learners.

Positioning of this study
Together with the context of EFL learning, this research aims to provide ER material recommendations tailored to learners' preferences regarding the difficulty of the material rather than estimating learners' knowledge states.This is because, as mentioned earlier, estimating learners' knowledge states from their learning behaviors is a difficult task.The EFL material recommendation in this study uses a content-based method, which uses information retrieval techniques to automatically infer the difficulty level of the material and the learner's preference for it.This allows the recommendation system to simultaneously model the learner's state and material knowledge.In addition, the recommendation system also aims to ensure transparency within the system by explaining the reasons for recommending the material.It is hoped that this feature of the recommendation system will enable learners to trust the system and consequently increase their motivation to learn.

Recommender overview
Figure 1 shows an overview of the use of the recommendation platform.First, the platform administrator prepares a vocabulary list and teaching materials from the teaching materials store in advance.This word list contains vocabulary difficulty information and is used to evaluate the difficulty level of the teaching materials.Next, learners use the educational materials on the eBook reader system.This system converts the reading logs of the materials into the format of the xAPI (Advanced Distributed Learning Initiative, 2013), a comprehensive repository of learning/educational records.The reading logs stored in the Fig. 1 An overview of the recommendation platform LRS, along with the difficulty level information for each material, should be designed to be sent to the Learning Record Store (LRS) (xAPI.com, 2011), a comprehensive repository of learning/instructional records.Next, the recommendation mechanism generates the materials to be recommended to the learner, along with explanations of the reasons for the recommendations.These explanations include the weight of the recommendation (how much the material should be recommended) and the reasons for the recommendation.
Finally, the generated recommendations are presented to the learner through the recommendation interface.

Platform components
Students can do the ER activity using the BookRoll eBook reader (Figure 2).BookRoll is designed to access eBooks or lecture slides inside or outside the classroom (Flanagan & Ogata, 2018).The BookRoll system tracked students' online operations, such as flipping to the next or previous pages.All reader operations are recorded in the LRS.With the help of BookRoll, students' reading behaviors are collected, including the reading pages, words, time, and speed.
The user interface for the recommendation shown in Figure 3 is implemented on the Goal-Oriented Active Learning (GOAL) system (Yang et al., 2024).The GOAL system is a platform to support students' development of data-informed SDL ability (Li et al., 2021;Majumdar et al., 2018).It provides five recommended learning materials at most.Users can jump directly to the BookRoll by clicking the title and can read the recommended material.Each recommended material is followed by the recommendation weight and explanatory sentences explaining why the recommendation was made to the learner.The provided explanation can be shown or hidden when learners click a button next to the explanation.

Mechanism overview
The recommendation mechanism we propose in this paper uses model-driven and datadriven approaches.First, as a model-driven approach, this mechanism uses a wordlist with information on the difficulty of vocabulary, which is used to evaluate the difficulty of materials in the book library.Next, as a data-driven approach, reading logs estimate learners' preferences for the difficulty levels of materials with the estimated material difficulty.Then, the system makes material recommendations based on the material difficulty and the learner's preference.This mechanism generates recommendations so that the difficulty of recommended materials will be as close to the learner's preference as possible.Besides, the descriptions which explain why the material has been recommended to the learner are displayed.

Difficulty of vocabulary
The difficulty level of instructional materials is determined by several factors that make up the materials.In this study, we focus on vocabulary difficulty and assume that vocabulary primarily determines the difficulty of instructional materials.In other words, we assume that teaching materials containing complex vocabulary are generally tricky, and teaching materials containing a few problematic or many easy vocabulary words are easy to read.However, if the complex vocabulary is not essential for understanding the material, learners can understand the content without knowing the meaning.Therefore, to estimate Fig. 3 UI of the picture book recommender system implemented in the GOAL system the difficulty level of the material, we used the vocabulary profile and the ranking function in information retrieval.
This study used the CEFR-J Wordlist Version 1.6 (Tono, 2020) as a reference for vocabulary difficulty.This vocabulary list is based on the Common European Framework of Reference for Languages (CEFR) (Council of Europe, 2001).It is a corpus created from English textbooks used in China, Taiwan, and Korea, from which common vocabulary used in CEFR-level texts in each country/region is extracted.It was constructed for English language education.The list contains 6868 headwords, each of which has four levels of difficulty: A1, A2, B1, and B2 of the CEFR.
To calculate the difficulty level of the material, we first quantified the difficulty level of each word as A1 to 1, A2 to 2, B1 to 3, B2 to 4, and so on.Since words with multiple meanings have multiple levels for each meaning, these difficulty levels were set as multiple averages.For example, "will" has difficulty level A1 as a modal auxiliary verb and level B2 as a noun meaning mental power.The difficulty is 2.5 in this case, with an average of 1 (A1) and 4 (B2).The exact words are also considered different words if they are spelled differently (e.g., "color" and "colour," both with difficulty A1).Therefore, the difficulty of the word , represented as (), should be expressed as () ∈ [1,4].

Difficulty of materials
The difficulty of the documents was calculated using each word introduced above, the TF-IDF score (Anand & Jeffrey, 2011), and the number of words comprising each document.In our method, all words in each document are extracted and the document is regarded as a bag of words.TF-IDF is a ranking function defined for a tuple of documents and words in information retrieval, which can be applied to bags of words (sets of words in which the elements are allowed to be duplicated) and estimates the relevance of a document to a query.The TF-IDF score refers to the relevance of a query word to a document and indicates the importance of the word in the document.In this study, all words in the "CEFR-J Wordlist Version 1.6" are assumed to be query words.The document score is interpreted as the importance of the word: If the TF-IDF score for the document  and the word  is high,  is an essential word for understanding the contents of .Besides, we deal with each material as a bag of words consisting of the words in the vocabulary list.
Namely, when a set of all words in the vocabulary list is represented as , the material  is expressed as ′ ∩ , where ′ is a bag of words of the material corresponding to .
The difficulty of a material  is calculated by using the following formula: where the TFIDF(, ) is the TF-IDF score for the tuple of the material  and the word  as a query,  is a set of all the materials, and  1 and  2 are constants.This formula can be interpreted as the difficulty of each word and its significance in a particular document contributing to the overall difficulty of the document.Besides, since the number of words composing the material is considered to affect its difficulty (the more words it includes, the more difficult it is), the value obtained by normalizing it is also used.Thus, the difficulty of the document will be computed as a total of products of the difficulty of the word and its significance.

Learner's difficulty preference estimation
Learners generally use English language materials of a difficulty level appropriate to their EFL proficiency.Previous research has confirmed a linear relationship between reading proficiency and the percentage of vocabulary in English language materials (Schmitt et al., 2011).As mentioned earlier, it is challenging to determine proficiency from learners' use of English language materials.However, this fact suggests that the difficulty of the materials used by learners is a good indicator of their EFL proficiency.In this study, learners' use of the learning materials was employed to estimate their preferences for the difficulty level of the materials.
According to the difficulty of the materials mentioned above, we extracted the top five most difficult materials from those used by the learners.We believe the difficulty of these five materials will best reflect the learner's difficulty preference because learners are expected to change the materials they use as their EFL proficiency improves.Then, the average difficulty of these five materials is calculated as the learner's difficulty preference.
From now on, the difficulty level preferred by the learner  will be denoted as ().
This feature guarantees that the value of () is close to the maximum value of the learner's estimated difficulty preference.

Explanation for the recommendation
Our proposed recommendation algorithm is designed to provide feedback to the learner on the recommended material and the reason for recommending it.This allows the learner to be convinced of the recommendation and motivated to learn.This supplementary information includes the learner's difficulty preference (it implicates his/her proficiency), the recommended material's weight, and a sentence explaining why the material suits the learner.
Table 1 Sentences that explain why the materials were recommended.They depend on the difference between the difficulty of the material and the learner's difficulty preference

𝑫(𝒅) − 𝑷(𝒔)
Explanatory Sentence less than -0.3 (very easy) "This book is easy, but you can learn basic vocabulary with fun from this." -0.3 to -0.1 (easy) "This book is a little easy, but you can learn important vocabulary with this book."-0.1 to 0.1 (average) "This book is perfect for your English skills!" "This book is a little difficult but worth trying!" more than 0.3 (very difficult) "This book is challenging.Let's give it a try!" We prepared five types of sentences explaining why the materials were recommended to the learner.According to the theory of proximal learning, when learners decide whether to study, they depend on their belief whether they already know the items.They will choose not to study if they believe they know the item already, and vice versa (Metcalfe & Kornell, 2005).Thus, the explanatory sentences should lead the learners to select the recommended materials.The types of sentences depend on the difference between the difficulty of the recommended material and the learner's preferred English difficulty level, i.e., the value of () − ().This takes a value close to 0 when the difficulty of the recommended material  (i.e., ()) is close to the learner 's difficulty preference.When the value is smaller than 0, the recommended material is more accessible than the learner's level, and when it is larger than 0, the material is more complicated.The recommender provides different sentences according to the learner's difficulty with recommended materials, as shown in Table 1.
These messages were written to motivate learners to use the recommended materials.
When the difficulty of the materials is close to the learner's estimated preference, the explanation states that this material perfectly fits the learner's English level.Besides, even if the recommended material is too easy or difficult, the recommender explains they are worth using.

Recommendation weight
The recommendation of materials by difficulty level refers to the recommendation of materials whose difficulty level is closest to the learner's estimated English level or the preference for the difficulty level of the material.In other words, recommended materials should not be too easy or too difficult because materials that are too easy or too difficult will negatively affect learning effectiveness.In this study, we introduce a recommendation weight for material  (denoted as (, ) ), i.e., a value that indicates how much we recommend material  to learner .We therefore define this value as follows: This formula takes a significant value if the difficulty level of the material  (i.e., ()) is close to the English preference level of the learner  (i.e., ()), and a small value otherwise.If the recommendation weight (, ) takes a significant value, it will recommend material  more strongly to learner  .This guarantees that the learner  receives the recommendation of the material , whose difficulty is close to 's English skill levels.In the implementation, the value of () − () is adjusted so that it is not zero.
The recommendation weights provided are linearly normalized so that the minimum value is 0.0 and the maximum value is 100.0 for ease of understanding by the user.The learner's preference and the difficulty level of the material are similarly normalized.

Method overview
The difficulty estimation of English materials presented in this paper is essential for presenting how strongly recommended the recommended materials are for each learner.To show that this difficulty estimation mechanism can make accurate estimates, we conducted a simple experiment to measure the difficulty level of English materials.In this experiment, we used English picture books with difficulty levels ranging from A1 to B2/C1 on the CEFR level.Since these difficulty levels were displayed in the picture books in advance, we were able to check whether the formula presented above (()) worked by comparing these difficulty level indicators with the difficulty level values estimated by the formula.
Therefore, we calculated the difficulty level of the material busing the formula and summarized it using the difficulty level labels.The constants  1 and  2 were set to 0.1 and 1, respectively.

Result
Table 2 shows the average number of books and estimated difficulty for each difficulty label.The results show that as the difficulty level of the label increases, the estimated difficulty level also increases.To confirm that the rank order of the estimated difficulty level coincided with the rank order of the label difficulty level, Kendall's rank correlation coefficient between the two was calculated, with a statistically significant value of 0.9820 ( < 0.001).This means that the order of difficulty labeled in the books and calculated in the formula has a strong positive correlation.Therefore, it is concluded that the formulas presented in this study can correctly estimate the difficulty level of the materials based on these characteristics.Focusing on the estimated difficulty level, it can be read that the increase in the calculated difficulty level with increasing difficulty level is significant for the materials with labeled difficulty levels from A1 to B1, but the increase in the estimated difficulty level with increasing difficulty level is slight for the materials labeled more difficult than B1 compared to the previous.This indicates that the increase in the difficulty level of the overall teaching material is smaller than that of the previous one.This may indicate that the effect of vocabulary difficulty on the overall difficulty of the materials is significant for the materials with low difficulty but decreases as the difficulty level increases.

Method overview
This experiment was conducted as part of the GOAL project to support students' independent learning.The experiment aimed to assess the extent to which recommendations of explainable materials were accepted by learners based on their behavior and perceptions, whether the recommendations influenced their learning behavior, and the characteristics of learners according to their perception of the recommendations.

Participants and experimental settings
This experiment was conducted in a Japanese junior high school with an online learning environment through Moodle (Moodle.org, 2017).In this school, all students had tablet PCs and access to the Internet at home, which allowed us to track and analyze students' learning logs in real time.The experiment involved 120 first-year students and 240 secondyear students.The explainable recommendation system was made available to all participants, and all were able to use the recommendations at any time during the experiment.Because junior high school is compulsory education in Japan, we did not conduct a multiple-condition comparison experiment to ensure equal educational opportunities.The first-year middle school students began participating on June 8, 2022, and the 120 second-year students began participating on May 10, 2022.The experiment was completed on July 20, just before summer vacation for first-and second-year students.
A summary of the experimental setting is shown in Table 3.

Materials
When the experiment was conducted, 534 picture books for the ER program were stored in the library and categorized according to the difficulty level of the CEFR.Participating students were free to choose and read at their leisure.

Evaluation
The data obtained in this experiment were evaluated from four perspectives: (1) acceptance of recommendations according to learning behavior, (2) acceptance of recommendations according to learner perceptions, (3) impact of recommendations on learner learning activities, and (4) characteristics of recommendations according to learner perceptions.
The evaluation phase used usage logs obtained from the BookRoll and GOAL systems, and a post-poll was conducted after the end of the ER program period.The usage logs were used to evaluate learning behaviors and learning skills.Three indicators used in the experiment were used to evaluate learning behavior: number of pages read by the learners, reading time, and number of words.Learning ability was assessed by the learners' reading speed (words per minute).The results of the post-poll were used to assess learners' perceptions of the recommendations.

Post-poll
After the ER program was completed, a post-event survey on the recommender system was conducted.The poll was based on the Technology Acceptance Model (TAM) (Park et al., 2012) and aimed to investigate participants' perceptions of the system's usefulness, ease of use, and attitudes.TAM is an indicator commonly used to measure to what extent the system is accepted by users in the context of learning/educational support by information technologies (Granić & Marangunić, 2019).Each perception included three question items, for a total of nine items.All question items were rated on a 5-point Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree).A summary of the poll questions and their classification in the TAM is presented in Table 5.

System usage and recommendation acceptance
First, following the method of Takami et al. (2023), we define the terms "access" and "click" as used in this paper.We define "access" as an action in which a learner accesses the GOAL system's educational material recommendation page, and the system displays the five recommended educational materials.We also define "click" as an action in which a learner clicks on the title of a recommended teaching material and jumps to the book roll page.For example, if a learner accesses the GOAL system's recommendation page and clicks on three titles of recommended educational materials, these actions are counted as "one access and three clicks." The number of accesses and clicks to the recommended picture book titles were counted to determine the frequency of use of the recommendation system.Figure 4 shows the number of accesses and clicks per day on the recommendation page, separately for first graders (left) and second graders (right).Total Clicks in the figure show the number of times an ER book was opened in BookRoll and whether a recommendation was used.Of the 240 participants, 100 students accessed the recommendation page at least once, and 46 students clicked on the title of the recommended material at least once.Figure 4 shows that except for a big spike in mid-June, the frequency of use of the recommendations was not very high.This result may reflect that all participants were introduced to the recommendation function by their teachers and tried using it in mid-June.Furthermore, first-year students used the ER materials without recommendations more frequently during the entire duration of the experiment.Other than that, there also was a small spike just before the end of the experimental period.In contrast, second-year students did not use ER materials without recommendations, except for big spikes in accesses in mid-May and mid-June.The first-year and second-year students had spikes just after the start of the experimental period in common.Table 6, which shows the daily averages of accesses and clicks, shows that participants accessed and clicked approximately 4 and 3 times daily, respectively.It also shows that first-year students used the recommendations more frequently than second-year students.Acceptance rates based on the number of accesses and clicks were also calculated to evaluate the degree to which the recommendations were accepted.First, the acceptance rate based on the number of accesses was calculated as the ratio of accesses in which one of the titles was clicked to all accesses.For example, suppose a learner accessed the recommendation page three times: on the first access, he/she clicked on three of the recommendation titles; on the second access, he/she clicked on one of the titles; and on the third access, he/she clicked on none of the titles.Then, the learner acceptance rate based on the number of accesses would be 66.7% (=2/3) since none of the titles were clicked twice out of the three accesses.
The acceptance rate based on the number of clicks was then calculated as the ratio of the number of recommended titles clicked on to the total number of recommended titles.In other words, in the above example, the learner clicked on four titles and a total of 15 (=3 x 5) recommended titles, resulting in a learner acceptance rate of 26.7% (=4/15) based on clicks.
The overall pass rate and the pass rate for each candidate were calculated in terms of both accesses and clicks.The results are shown in Table 7.This table shows that more than 40% of the accesses led to clicks on recommended books.However, the average acceptance rate for each student was lower than the overall average, indicating that not many students used the recommended books frequently.This implies that many students used the recommended books, but only a few used them continuously and repeatedly.The fact that the number of clicks was lower than the number of accesses also suggests that not many students clicked on more than one title in a single recommendation.

Acceptance by learners' perception
The post-poll results were tabulated for the nine main items shown in Table 5.A total of 203 respondents responded to the posterior poll.Cronbach's alpha was calculated to be 0.884 (>0.8), indicating that the results of the posterior survey were internally consistent and reliable.Table 8 presents the results of the posterior survey by question item.
Table 8 shows that the respondents gave high ratings for ease of use (PEU) and attitude (AT) but relatively low ratings for usefulness (PU).Specifically, question AT1, which asked for a general opinion on whether using recommendations to read books was a good idea, had a high score.In contrast, questions PU1 and PU2, which asked how effective and efficient the recommendations were for learning, respectively, had a low score.Although  the learners did not feel that the recommendations helped them improve their EFL skills, the results suggest that the recommendations were not complex for them to use.Many learners accepted them favorably and thought using the recommendations was an excellent way to learn English.This suggests that many learners accept recommendations favorably and think that using recommendations to learn English is good.In other words, the use of recommendations was effective in motivating EFL learners.

Learners' characteristics based on their perceptions
To examine learners' attitudes toward and characteristics of recommendations, Spearman's correlations between the learners' learning activities and indicators of their use of recommendations (number of pages read, time spent reading, speed of reading, number of words read, and acceptance rate by number of accesses and clicks) and the results of the posterior survey were calculated.Table 9 shows these correlations, which reveal that learners' awareness of the recommendations is significantly and slightly positively correlated with the number of pages, time, and words read, indicators of learning activity.
Acceptance, however, showed little correlation with learners' awareness of recommendations, regardless of the number of accesses or clicks.This result suggests that learners who were highly motivated to engage in reading activities had a more positive attitude toward using recommendations than those who were not, which is consistent with the inferences made by the previous results.For a more detailed analysis, the participating students were clustered using their responses to the post-event poll, and the mean of each indicator was calculated.The clustering algorithm used was K-means clustering.Table 10 presents the results of this analysis.First, the number of clusters was determined to be three using the elbow method (see Figure 5).The clustering results showed that students in Cluster 1 were the negative about the recommendation, students in Cluster 3 were the most positive, and students in Cluster 2 were in between.Figure 6 shows that the values for each indicator are larger for clusters 1, 2, and 3, in that order, except for the pass rate by access.The table shows that students with more positive perceptions are more active in reading activities and read recommended books more often than those who do not.However, the acceptance rate by number of accesses was highest in Cluster 1, slightly lower than in Cluster 3, and lowest in Cluster 2. The fact that students with negative perceptions showed the highest acceptance rate by the number of accesses suggests that the need for recommendations is high but less effective in motivating learners to read.Students in Cluster 3 were probably more engaged in reading prior to the experiment, and the high click rate suggests that they were technologically familiar and found the multiple book recommendations helpful.The estimation results of the difficulty level of English materials showed that for relatively easy materials, the vocabulary level included tends to affect the overall difficulty significantly.In contrast, the effect of vocabulary is less significant for hard materials.As examples of the components of the difficulty of EFL materials, unfamiliar vocabulary (Kasim & Raisha, 2017;Qarqez & Ab Rashid, 2017) and unfamiliar content and grammatical matters (Satriani, 2018) can be raised.Considering that vocabulary and grammar are crucial in EFL learning, among these factors, the difficulty of grammar and content, rather than vocabulary itself, has a more substantial influence on the difficulty of the material in complex materials.Therefore, in response to the RQ, the present difficulty estimation system can accurately measure the difficulty of materials that are easier than those that are more difficult to some extent.
Although the difficulty estimation was conducted for extensive reading EFL materials in this study, the results can be generalized to other contexts if the materials and books are mainly composed of English texts.This difficulty estimation mechanism is entirely dependent on the content of the materials and is not related to the intended purpose of the materials.Therefore, measuring the difficulty of commercially available materials and texts written by EFL teachers, for example, is possible.However, it should be noted that this estimation is based on the CEFR-J, which, as mentioned earlier, is vocabularydependent and a difficulty level for EFL learners.In this study, the difficulty estimation Fig. 6 The breakdown of answers and learners' activities by each cluster was based on materials with pre-labeled difficulty levels.However, this difficulty estimation mechanism also allows us to recommend EFL materials that do not have prelabeled difficulty levels.
However, there is room for improvement in the method used to generate explanations for recommendations based on the difficulty level of the material.In our work, we arbitrarily set the ranges that define which explanatory sentence is provided to learners.Therefore, it is difficult to evaluate recommendations quantitatively without a learner's subjective evaluation of the recommendation.We require a more detailed investigation to verify whether these values can really serve as a criterion to define a range of difficulties.
RQ2: What characteristics of learners supported the recommendation of English language materials for ER that could be explained?
The results of the field experiment in the environment for education showed that few students used the recommendations repeatedly or continuously.On the other hand, students who used the recommendations were more active in their learning activities, suggesting a positive correlation between students' learning activities and the use of the recommendations.It also suggests that many students considered using recommendations a good thing and had a favorable impression, especially regarding ease of use.This idea is supported by the fact that the frequency of use increased only when the teacher introduced the recommendations to the students.Furthermore, although small, students who were highly motivated to read used the recommendations repeatedly, clicking on multiple titles from a single recommendation.Thus, although the recommendations did not motivate much learning, they were attractive, especially to highly motivated students to read.
Comparing the use of the recommendation system by participating first-year and secondyear students, more first-year students accessed the recommendations than second-year students.This can be attributed to the ability of the system and the learning with the system itself to arouse the learners' curiosity; the e-learning system itself has the power to arouse the curiosity of the learners themselves (Sarac et al., 2022).The learners' curiosity can explain the small spikes just after the start of the experimental period; they may be curious about the newly introduced function and tried using it.Besides, since first-year junior high school students who have just entered the school are unfamiliar with e-learning systems, it can be assumed that they were more interested in learning with the system and engaged in more learning activities.However, the effect of the system did not last long and did not arouse much curiosity in the second graders.
Other than that, we could find the slight increase in the number of accesses by the firstyear students just before the end of the experimental period.Since the end of the experiment coincides with the beginning of the summer vacation, it is thought that the teachers' announcements about learning during the summer vacation period increased the number of accesses and clicks.

Suggestion for the improvement of the recommendation
The results indicate that the recommendation of explainable English language materials was favorably accepted, especially by learners who were highly motivated to learn but were ineffective in motivating learners to learn.This suggests that the explanations for the recommendations, which were intended to motivate learners to learn, were not convincing enough to explain the rationale for the recommendations to the learners fully.The recommendation is based on the difference between the difficulty level of the material and the learner's English level, and the explanation is based on that difference.However, the explanation consists of a single sentence, which may be too brief for learners to understand why the recommendation was made entirely.Therefore, it is necessary to devise more detailed, precise, and persuasive explanations.For example, information on learners' learning behavior can be utilized since much data is accumulated in learning journals.
Information such as how much of a specific material the learner has read or whether there are pages the learner reads repeatedly could be utilized.Since we currently only use data on vocabulary read by learners, more detailed information on learning logs would increase the detail and accuracy of the explanations.
We also noted that the effectiveness of the recommendations on learners' English learning was low.We speculate that this is due to the short duration of the experiment and the lack of intensive learning activities, suggesting that more experimentation is needed to introduce recommendations in more types of contexts other than ER.The Explainable English Material Recommendation is designed to adapt to various EFL learning contexts.
For example, it can recommend English textbooks, reference books, or vocabulary and grammar quizzes.The learning benefits of this recommendation may become apparent in contexts where learners intensively study English.

Conclusion and future work
This paper introduced an explainable English material recommendation targeting EFL learning.This recommendation is aimed at personalized English learning based on the learner's material preferences for the difficulty and the difficulty level of the material.
Explainable English material recommendations recommend suitable English materials for learners based on their preferences and the difficulty level of the materials.To realize this feature of recommendation, we developed a mechanism to estimate the difficulty level of EFL materials automatically by employing TF-IDF, one of the information retrieval technologies.Using this mechanism, we calculated the difficulty level of EFL materials for extensive reading and showed that it can correctly measure the difficulty level.In particular, the mechanism could accurately estimate the difficulty of the material, especially for relatively simple material where the influence of vocabulary on difficulty was significant.
Then, we explained the details of the recommendation platform and its mechanism and experimented with 240 junior high school students.The experiment results showed that the recommendations did not affect students' learning motivation.However, they were favorably accepted by students who were highly motivated to learn in the first place, and their use was significantly and positively correlated with students' learning behavior.This indicated that the recommendation system was not effective in motivating learners in general but was readily accepted by naturally motivated learners.
There are two challenges to improving the recommendations and testing their effectiveness: further use of learning data and experimentation in various contexts.In this recommendation, the estimation of the difficulty of the material was based solely on vocabulary.The learner's estimated preference for English skill level was the average difficulty of the five most difficult materials that the learners had read in the past.Thus, the learning data used in this recommendation ignored detailed features of the language, such as English grammar and word usage, and learners' proficiency in each vocabulary and grammatical item.Therefore, the accuracy of the recommendation can be improved by utilizing further detailed learning data.In addition, more detailed and persuasive explanations that go into the learner's skill level can be provided by utilizing these data.
This is expected to improve the system's effectiveness in motivating learners to learn.
Future directions in this research include improving the quality and explainability of recommendations by introducing a knowledge model.The only domain knowledge information used in this study was lexical profiles.However, it would be possible to make recommendations that go into the semantic content of the material, for example, using the semantic associations between words stored in the lexical knowledge map developed by Flanagan et al. (2019).Second, while our study did not focus on the learner's preference and needs, they are still important elements to make appropriate recommendation that can motivate the learner.Our work should aim to consider these elements to enable better learning outcomes resulting from the recommendation.In addition, we will conduct an experiment on the difference in the effect of the presence or absence of explanatory text in the recommendation on the learning effect as part of the future work.Another direction is the use of learner models that store information about the learner's skills.If we can establish a complementary relationship in which the recommendation system uses the learner model's performance information and, conversely, reflects the use of recommendations back to the learner model, we can expect to improve the quality of both the recommendation system and the learner model.

Fig. 4
Fig. 4 Numbers of daily accesses and clicks on the recommendation page

Fig. 5
Fig. 5 Explained variance for the clustering.Based on this, we decided on three clusters

Table 2
Results of an evaluation of an equation that computes the difficulty of books

Table 3
Summary of participants and experimental settings

Table 4
The number of picture books classified by CEFR levels

Table 5
List of questions in the post-poll and their class in TAM.Each question item was rated on a

Table 6
Daily means of the number of accesses and clicks

Table 7
Acceptance rate of the recommendation.The value was calculated by access to the recommendation or the number of clicks on the title(s) of the recommended books

Table 8
Summary of the post-poll results.This shows the number of participants who answered each question as 1 (strongly disagree) to 5 (strongly agree) and the mean values of the answers for each

Table 10
The clustering result using the poll answers (mean (std.)).It also shows the breakdown of answers and learners' activities by cluster