Generalization support environment for understanding ways to use English words

When we translate Japanese sentence into English, sometimes several English words become the candidates. However, the usage situation of these candidate words is not the same. In order to choose appropriate words from them, we need to understand the usage situation for each candidate words. Usage situation of the words can be inferred by co-existing words in their example sentences. Co-existing words in example sentences are not always the same, so in order to understand usage situation, we need to generalize co-existing words from several example sentences. However, some of us who do not consciously generalize the co-existing words do not acquire the usage situation. This paper proposes the system which provides the environment where we can explicitly generalize co-existing words (keywords) in the example sentences to acquire the usage situation of the target words. This system also has a generalization support mechanism to provide concepts of words acquired through WordNet as hints. According to the experimental results, participants who used the system in learning English words reduced the number of incorrectly choosing the words and promoted to derive the own understanding of the usage situation.


Introduction
When we learn second language, it is sometimes difficult to understand the usage situation of words, especially if there are more than one words of the similar meaning. For example, English words create, make, build, and design are all translated into the same Japanese word tsukuru, but their meanings are slightly different. If we only understand the meanings of these English words by corresponding Japanese words, we cannot tell their differences and, hence, are not able to use these words at the correct situation.
Various multimedia contents were introduced to facilitate obtaining words, such as pictures and animation (Chen & Hsieh, 2008;Sun & Dong, 2004). Gamification is also introduced to make memorization task enjoyable (Hasegawa, Koshino, & Ban, 2015;Smith et al., 2013). These approaches might promote the memorization of the meaning of English words, but did not support deep understanding of the meaning of words including their usage situations. Some researches try to provide learning environment where we can learn words with their usage situation. Ogata et al. developed the ubiquitous language learning support system which teaches gives that fit for the real world situation where we use ubiquitous device (Ogata & Yano, 2004). Nishihara et al. introduced comic to learn role words in Japanese and create quizzes by using scenes in the comic book (Nishihara, Matsuoka, & Yamanishi, 2018). These systems encouraged us to learn words according to our context. However, they did not explicitly discriminate the usage situation of the similar words.
Rawson insisted that the concrete examples that illustrate how the abstract concepts can be instantiated in the real-world situations support learning of the declarative concepts (Rawson, Thomas, & Jacoby, 2015) and Nation suggested to introduce example sentences so as to master words through reading (Nation, 2001). In example sentences, words that we want to learn and words by which we can understand usage situation of the target words are included together, so learning using example sentences leads to understanding of the usage situation. Samia et al. suggested to memorize individual example sentences in English word learning (Samia & Abdelkrim, 2012). This learning method aimed to memorize typical phases that appear in example sentences, not to grasp the usage situation of the words. Therefore, based on this method, we were not able to acquire the knowledge to apply the words in the different phrases.
Usage situation of the words can be inferred by co-existing words in their example sentences. Co-existing words in example sentences are not always the same, so in order to understand usage situation, we need to generalize co-existing words from several example sentences. However, some of us who do not consciously generalize the coexisting words are not able to understand the usage situation. Benson et al. proposed learning steps to understand meaning of words, including their usage situation (Benson & Lor, 1999). They insisted that to generalize the meaning of English words is important for deep understanding of the word. Matsubara et al. suggested that observing several example sentences promotes understanding of the usage situation of the words and proposed the retrieval system of example sentences (Matsubara, Kato, & Egawa, 2008). This system implicitly expected us to generalize situation of the example sentences given by the system, but did not explicitly support the generalization.
To learn by example sentences is regarded as one type of the learning styles in discovery learning. In discovery learning, we try to find general knowledge by generalizing the observed targets (Johns, 2010;Swaak, Jong, & Joolingen, 2014;Yamashita et al., 2016). Several researches tried to support the discovery learning and provided the simulation environment where we can check whether our generalization is appropriate (Veermans & Joolingen, 2004;Rieber, Tzeng, & Tribble, 2004). However, they did not support generalization process itself.
The generalization is regarded as one of the important activities in learning (Rivera, 2014). However, since this generalization process is usually not trained in the school, it is difficult for us to put in the habit of generalizing words every time we learn with example sentences. McIntosh et al. tried to establish the curricula for acquiring the social skill. Their curricula include giving several situations for using the social skills, which promotes students to generalize their skills to apply to different situation. However, how students generalize their skills is implicit, so that the generalization process is not directly supported (McIntosh & MacKay, 2008). The aim of this research is to propose a method for making us execute explicitly the generalization process from example sentences for acquiring the usage situation. The generalization process consists of two steps: to extract the keywords from example sentences that may reflect the usage situation and to select common characteristics from keywords and replace them with one generalized word.
As the first step of our research, this paper aims at making users experience this generalization process and understand the effective of the generalization process. This paper proposes the system through which we can externalize our generalization process. In the system, we can manage the generalized words as a graph structure. Externalizing our own generalization process through the system makes us easier to derive the generalized words and consider the importance of the generalization. However, we are not always able to derive the generalized words easily, so the support method for generalizing words may be needed. In order to cope with this problem, our system provides conceptual knowledge of words acquired from concept dictionary called Japanese WordNet (Bond, 2018). Based on the experimental evaluation, the proposed system could help us of considering the usage situation of English words. In addition, the mechanism of providing the concept dictionary contributed to the decision that the generalized words were appropriate. Since the number of participants was small, further evaluation should be needed to prove the effectiveness of the system.
Currently, the quality of derived generalized words is not focused. In order to acquire the correct usage situation, the viewpoints of generalization is important. If generalized words do not relate to the correct usage situation, the target English words are not used appropriately. Our future work will tackle this problem and develop the mechanism for deriving the appropriate generalized word so as to reach the correct usage situation.

Overview of generalization-based learning support system
Learning method for understanding usage situation of words In order to learn the usage situation of English words from its example sentences, it is necessary to (step1) extract words expressing the situation from the example sentences as keywords and (step2) generalize them as our own words as generalized words.
In step 1, keywords are determined according to the part of speech of the learning words. Words that become keywords according to the part of speech of the learning words are shown in Table 1. If the part of speech of the learning word is a verb, an object of the verb represents the usage situation of the learning word. If the learning word is an adjective or an adverb, the word that it modifies becomes the keyword. For example, let us assume that the learning word is build and its example sentence is "We build a house." Since the usage situation of the verb is determined by its object, the word house becomes the keyword. In step 2, generalization is a process of exchanging target words into a new word that consists of common characteristics of the target words. Generalized words should include characteristics of all keywords of the learning word, but do not include keywords of other words of the similar translation. In order to derive such generalized words, we sometimes need to generalize the generalized words as well, so as to include generalized words of all keywords. Figure 1 shows example of practicing this learning step. In this example, the learning words are bake and generate, both representing tsukuru (making) in Japanese. The encountered example sentences are as follows: "I bake cake.", "I bake bread in the oven.", and "I generate an electricity." In step 1, the objects of these verbs are selected as keywords from the example sentences, such as cake, bread, and electricity. In step 2, generalized words are generated from the specified keywords. In this example, baked goods is generated from cake and bread, energy is derived from electricity, and intangible is generated from energy. In this case, we can take bake when creating a sentence whose object is potato.
Of course, there are more than one common characteristic of keywords. If characteristics that do not relate to the usage situation are selected and generalized word is derived based on them, the correct usage situation of learning word cannot be acquired. For example, if sweet is derived from cake and bread, it may be difficult to reach to the baked goods and bake is not able to be selected when making baked potato. The level of generalization is also important. For instance, if food is generalized from baked goods, all foods probably are used with bake. However, it is inappropriate to use bake with jelly or soup. Therefore, in generalization process, appropriate characteristics should be selected and appropriate level of generalization should be made.
In this research, as a first step of acquiring the usage situation based on the generalization, the aim is to let us get used to the generalization process. The correctness and appropriateness of the generalized words are not considered and the development of the support mechanism to lead to the correct generalized words remains as our future work. System overview Figure 2 shows the overview of the system that supports generalization of keywords.
This system provides an interface through which learning words, keywords, and generalized words can be organized by graph structure. The graph is defined as a classification graph and the interface is called a classification graph interface. By expressing relationships among learning words, keywords, and generalized words using graphs structure, the generalization may be promoted and the comparison between words becomes easier. Also, we can understand the way of acquiring the usage situation. The created classification graph is stored as classification graph data.
On the other hand, generalization support mechanism provides hints for users who have difficulty in deriving generalized words. For such users, presenting a meaning of the words in the classification graph may become trigger to come up with the generalized words. The system holds concept dictionary and presents the meaning of a word, if required.

Classification graph interface
The classification graph is a form for organizing learning words, keywords, and their generalized words. It consists of nodes and links. The nodes represent words and have their types, such as "learning word," "keyword," or "generalized word." The link represents a derivation relationship. That is, the link between "keyword" and "learning word" indicates that the "keyword" is derived from "learning word." In the same way, the link between "generalized word" and "keyword" shows that the "generalized word is derived from "keyword." Figure 1 in the "Overview of generalization-based learning support system" section is the classification graph. The nodes in the first layer correspond to learning words and those in the second layer are keywords. The nodes in the deeper layer represent the generalized words. We have developed a system through which users can easily create this classification graph. This system is implemented with programming language C#. Figure 3 shows the classification graph interface. Through this interface, users can create a classification graph by inputting learning word, keywords, and generalized words. The interface is composed of a classification graph display section for displaying a classification graph and a keyword input section for inputting a learning word, a translation of a learning word, an example sentence, and keywords extracted from the example sentence. In the classification graph display section, the classification graph created by the user is displayed. The "learning word" node is shown as yellow, the "keyword" node is blue, and the "generalized word" node is green. The link is represented in black. When the user inputs the learned example sentences and the keywords extracted from them into the keyword input section and pushes the add button, the nodes representing the learning words and keywords are displayed in the classification graph display section, and the links are added between them. In the nodes, words and IDs assigned automatically by the system are described. The created nodes can be removed or modified through keyword input section. In addition, all nodes in the classification graph display section are erased by pushing the reset button.
When the generalization button of the keyword input section is pushed, a generalization interface for creating generalized words is emerged (Fig. 4). In the keyword generalization section, new generalized word is created from keywords. By specifying keywords to be generalized, entering a generalized word for it and pushing the decision button, a node representing generalized word is generated in the classification graph. In the re-generalization section, new generalized word can also be derived from existing generalized words. When generalized words to be generalized and created generalized words are input and a decision button is pushed, a new generalized word node is generated and linked with the original generalized word nodes.

Generalization support mechanism
In order to derive generalize words, we need to know the meanings of words to be generalized. However, depending on the vocabulary skills, sometimes, we do not understand the meaning of words and we cannot produce generalized words. In order to cope with the situation, this research introduces a concept dictionary for producing hints for deriving generalized words with understanding of the meaning of words. Various concept dictionaries are developed and this research uses Japanese WordNet as a concept dictionary (Bond, 2018). Japanese WordNet is a translation of WordNet developed by Princeton University (Princeton University, 2018). It contains not only Japanese but also English description of the concepts of each word. By using Japanese WordNet, the meaning of both generalized words written in Japanese and keywords described in English is able to be acquired. Among various information provided by WordNet, the system provides the concepts of the input words as hints. Let us assume that we try to generalize the meaning of "cake" and "bread." The concept of the bread is "food made from dough of flour or meal and usually raised with yeast or baking powder and then baked" and that of the cake is "baked goods made from or based on a mixture of flour, sugar, eggs, and fat." According to these concepts, words "bake" and "flour" are commonly used. Therefore, the possible generalized words can be "something made from flour" or "baked good." Although WordNet has superordinate concept, the system does not provide it. It is because the concept of the single word is not always applied as a concept of all keywords of the learning word. In addition, to consider generalized words by ourselves may encourage us of creating our original interpretation about the usage situation.
The proposed generalization support mechanism is integrated into classification graph interface (Fig. 3). When the concept search button in the concept search section is pushed, the window for searching the concept knowledge is displayed (Fig. 5). By inputting the type of word and node ID, the concept of the selected words is acquired from WordNet and shown at the concept search section in Fig. 3.

Experiment
An evaluation experiment was conducted to investigate the effectiveness of the proposed learning method, learning support system, and generalization support mechanism. The evaluation experiment was carried out in two parts (experiments I and II). In experiment I, whether the generalization process and the classification graph interface can promote understanding of usage situation of English words was evaluated. The change of learning behavior in learning English word is also examined. In experiment II, the effectiveness of the generalization support mechanism in the system was verified.

Experiment I Setting
We have verified the effectiveness of the generalization process and the classification graph interface on understanding of usage situation of English words. The experiment was conducted with eight undergraduates in our university (participants a to h). Figure 6 shows the experimental procedure. In order to acquire participants' understanding of usage situation of English words, pre-test was carried out. In the pre-test, two kinds of tests were provided. One is the multiple-choice question for creating the English translation of the Japanese sentences (Fig. 7). In this test, Japanese sentence and its English translation sentence were given. In the English translation sentence, one word was blank and choices for the blank were given. Table 2 shows Japanese words that correspond to the blank part, English words that were prepared as the choices, and the usage situation of the English words. The usage situation was defined based on the articles that explain the differences of similar English words. One question was prepared for each Japanese word. Nineteen questions were given. The other test asked for the usage situation of each English word (Fig. 8). English words whose Japanese translation are the same were provided and participants were asked to answer their usage situation in the free description form. After the pre-test, for each participant, sets of English  words that were not fully understood were selected as learning words. Table 3 shows three Japanese translations of English words that were selected as learning words.
In learning 1, participants were asked to learn the usage situation of the word A with given example sentences without the system. They were able to learn by their own styles with the given example sentences, but they were not allowed to see other learning contents, including web pages and dictionaries. Two to three example sentences for each English word were prepared and were given to participants with Japanese translation (Fig. 9). Learning was terminated when the participants felt that they could use the word A properly. Next, in learning 2, example sentences of word B were given like  learning 1, and participants were asked to learn with the system without using the generalization support mechanism. At the beginning of the learning, learning word nodes and keyword nodes in the classification graph are already given in the classification graph interface. Participants were asked to create the generalized words from them. Learning was terminated when the participants felt that they could use the word B properly. In the post-test, participants were asked to modify the answers of pre-test results of their learning words A and B, if they wanted to change. From the changes, the effectiveness of the generalization process and the system was evaluated. After that, participants were asked to learn word C without the system. This was done to investigate whether the learning attitudes of participants were changed by the learning using the system. Therefore, in the questionnaire, questions related to their learning attitudes were asked. Also, the impression of the system was asked in the questionnaire. Table 4 shows items of the questionnaire. Table 5 shows the numbers of improvement rates of the multiple-choice question test and usage description test for the words A and B. For the multiple-choice question, 1.0  means that participants improved all incorrectness in the pre-test. '-' means that participants' answers in pre-test were all correct and there was no room for the improvement. Our system does not support deriving correct generalized words. It aims at making participants' original interpretation regarding to the usage situation. Thus, the improvement rate of the usage description means the ratio of decreasing blank answers. From Table 5, participants b, e, and h improved the scores in the multiple-choice question test and participants c and g decreased the number of blank parts in the usage situation. According to the result of the one-sample t test for the multiple-choice question test, there was a significant difference between the learning result without the system and that with the system (t (11) = 1.951, p < 0.05). On the other hand, the result of the one-sample t test for the usage description test indicates there was significant tendency between the learning result without the system and that with the system (t (13) = 1.549, 0.05 < p < 0.1). According to these result, to generalize the keywords in the example sentences helped participants of understanding the usage situation. Table 6 is the learning result of Word B using the system. In Table 6, "a" is the number of English words, "b" is the number of derived generalized words, and "c" is the  number of derived generalized words that are the generalization of all their keywords. In addition, for Table 6 (b) and 6 (c), the number of derived generalized words that lead to the correct usage situation is indicated with the parentheses. According to the Table 6 (b), about 52 % of the derived generalized words do not relate to the correct usage situation. In addition, from Table 6 (c), about 48% of the generalized words that are derived from all the keywords do not represent the usage situation. The reason for these inappropriate generalized words is due to prepared example sentences. For example, since the objects of the "mend" in the example sentences are doll and skirt, Nuno (cloth in English) is derived as a generalized word from many participants. The correct answer for "mend" is "simple damage." If example sentences with other simple objects, such as shoes or chairs, are given, participants might reach to the correct generalized words. Currently, our system only provides the environment where we can explicitly practice the generalization process. However, this environment does not ensure the acquisition of the correct usage situation. We need to devise the mechanism that encourages participants to check the appropriateness of generalized words by comparing them with other example sentences. Table 7 shows the answers to the questionnaire item 1. Many participants focused the related words of the learning words and their differences. They also tried to generalize the related words. However, learning 3 was conducted just after the post-   test, so they might be influenced by learning 2. We need to conduct further experiments to investigate whether participants' learning attitudes are really changed. Table 8 shows the number of participants' answers for questionnaire items 2 and 3. From the results of item 2, all participants could select the keywords easily. On the other hand, from the result of item 3, half of the participants had difficulty in creating the generalized words. Some participants commented "I could not imagine the kinds of words to create.", "I could not have confidence for the generalized words that I have created.", and "I had difficulty in deriving generalized words for some keywords." These comments suggested the necessity of the generalization support mechanism.

Experiment II Setting
Effectiveness of the generalization support mechanism was evaluated. The experiment was conducted with eight undergraduates in our university (participants i to p). Figure 10 shows the experimental procedure. In the experiment, participants were asked to answer the pre-test, which is the same as that in experiment I. After the pre-test, learning word D was decided according to their pre-test scores. Table 9 shows the learning words for each participant. Participants were asked to learn the learning word D using given example sentences and a system. However, in this phase, there were not allowed to use the generalization support mechanism. In questionnaire 1, they were asked to answer the question "Were you able to derive the generalized words easily?" The answer was selected from 4 Likert scale. 1 is the worst and 4 is the best. After the questionnaire 1, participants who had difficulties in creating the generalized words were allowed to use the generalization support mechanism. In using the mechanism, participants who used the mechanism were asked to write down the search words, created generalized word, and words that were used to derive the generalized word. Learning was terminated when the participants felt that they could use the word D properly. In the last questionnaire, they were asked the same question as questionnaire 1, and the  results were compared with that of the questionnaire 1 so as to evaluate the effectiveness of the generalization support mechanism.

Result
Four participants (i to l) used the generalization support mechanism and did learning 5. Table 10 shows the questionnaire results of these participants and Table 11 displays the searched words, created generalized words, and words from which generalized words were created. In their classification graphs, 18 generalized words created before using the generalization support mechanism. Among them, 13 were the words that are included in WordNet. The other words are compound words such as "something big" and are not able to be searched in the WordNet. However, searched words by participants i, j, and k were keywords, not the generalized words that participants derived. We consider that generalized words are created by participants themselves and their meanings are familiar with them, so most participants do not search generalized words. Instead, they tend to search the meaning of given keywords that are usually included in WordNet. This indicates the appropriateness of introducing WordNet as generalization support mechanism.
From Table 11, three participants could derive the generalized words. For all three participants, words in the searched results were used to derive the generalized words. These participants were all answered 4 in the questionnaire 2. They commented "The mechanism was useful because I could confirm the Japanese translation of unknown words," and "To compare meaning of words helped me." On the other hand, participant l could not create new generalized words. He commented that "I tried to create generalized words using the results of the searched words. However, after observing the search result, I found that searched words were not appropriate for keywords." Although new generalized words were not created, the searched result contributed to the decision that the selected words were not

Conclusion
This research proposed the learning support system for understanding usage situation of English words by the generalization of keywords in the example sentences. The learning system provided the interface where we can explicitly practice the generalization process and we can manage the generalized words easily. In addition, for people who have difficulties in generalizing words, the generalization support mechanism that provides concept knowledge of words from WordNet has been proposed. As a result of evaluation experiments targeting verbs, participants were able to select appropriate words and were able to have their own interpretation about the usage situation of the English words. From these results, to generalize keywords that come with learning words in example sentences may be effective for understanding usage situation of English words. On the other hand, from the experimental result of using the generalization support mechanism, 3 out of 4 participants were able to derive new generalized words. These participants said that concepts of words given by the mechanism helped them of noticing the generalized words. Since the numbers of participants in both experiments were small, we need further experiments to evaluate the effectiveness of our system and the learning method. Current system only provides the interface for practicing the generalization process and does not support deriving the appropriate generalized words. In generalization, the common characteristics of keywords are found and are replaced with one word/phrase. If the common characteristics that do not relate the usage situation is selected, the characteristics does not indicate the correct usage situation. One solution for this problem is to provide a large number of example sentences. If there are a large number of keywords, the number of their common characteristics is small and derived generalized word may relate to the usage situation. In order to realize this solution, we need to devise a mechanism of providing many example sentences. There are many online thesauruses that prepare many example sentences. To acquire the example sentences of  the learning words from such thesaurus and provide them may encourage users to reach to the correct usage situation. The other solution is for the system to check the correctness of generalized words derived and give feedback whether they are correct or not. If the derived generalized word is correct, the example sentence of the learning word with the generalized word exists. For instance, baked goods is a generalized word of cake and cookie that are often used as an object of the learning word bake. If we search the sentence "I bake baked goods" with Google, more than 3000 pages are searched. However, if we search the sentence "I baked brown thing," there is not a page. Therefore, to derive the mechanism that shows the search results of the sentence that contains the learning word and the generalized words may help us to check our generalization result and may promote us to modify the generalized word, if the number of the search result is small. In the current system, we need to select keywords from example sentences by ourselves to create the nodes of the classification graph. In such case, we may not always be able to select correct words as keywords. The part of speech of the keywords for the learning words is defined in Table 1, so the keywords of example sentences can be extracted automatically by using the part of speech tagger. For our future, we need to introduce the part of speech tagger into the system and develop the mechanism that provides the keywords of the example sentences automatically.