The relationships between test performance and students’ perceptions of learning motivation, test value, and test anxiety in the context of the English benchmark requirement for graduation in Taiwan’s universities

Backgound: Having been influenced by the trend of internationalization of higher education, most universities in Taiwan have implemented an English benchmark requirement for graduation, which requires students to demonstrate their English ability at a specified Common European Framework of Reference for Languages (CEFR) level through taking a standardized English language test (e.g., GEPT, IELTS, TOEFL iBT). This practice has been increasingly criticized for failing to achieve its intended goals of enhancing students’ English language proficiency and increasing students’ career mobility. Therefore, there is an urgent need to explore the consequences of using standardized tests in support of the policy.


Background
In 2005, the Ministry of Education (MoE) began implementing the English graduation benchmark policy in universities across Taiwan, with the aim to encourage college students to pass a credible standardized English test before graduation. Behind the policy lies the belief that taking a standardized test helps students to prove their English competence, prepare for future employment or advanced studies, and also increase their competitive edge in the global environment. Authors like Shohamy (2000) and McNamara (2001) have asserted that language testing serves the function of gate-keeping in many contexts. Similarly, in the context of Taiwan's higher education, language testing has been utilized as a tool to enforce the power of the English graduation benchmark policy.
At present, more than 90% of universities in Taiwan have implemented the English graduation benchmark policy, with each university setting its own benchmark standard. As students from different universities exhibit varying levels of English competence, it is very difficult for the government to formulate uniform standards. However, the MoE's adoption of the Common European Framework of Reference for Languages (CEFR) provided institutions, schools, and the general public with a reference for various levels of language competence and testing requirements (Council of Europe 2001). According to the CEFR, language ability is divided into six levels, specifically A1, A2, B1, B2, C1, and C2, in ascending order from low to high competence. Most of the universities require that their students pass an English test equivalent to the CEFR B1 level upon graduation, whereas the top five universities set their graduation benchmark at the CEFR B2 level. Starting in 2016, National Taiwan University, Taiwan's most prestigious university, has set the benchmark at the CEFR C1 level for its English major students.
Most universities accept either Test of English for International Communication (TOEIC) or General English Proficiency Test (GEPT) scores as proof of their students' English proficiency. Each university sets its own requirement if students fail to pass the graduation benchmark. Some schools require that students retake the test until they pass, some ask students to participate in additional training if they fail the test twice, and others set various "backdoor policies," such as requiring students who fail to sign up for an extra course as a required condition for graduating (Chen 2014). Although the graduation benchmark policy is widely implemented across Taiwan, regulations and problems stemming from the policy have become highly controversial, causing dissatisfaction among students and teachers as they demand to have the policy reexamined (Her et al. 2013;Chang 2005).
The English graduation benchmark policy has attracted the interest of scholars in Taiwan, and some studies have investigated the effectiveness of implementing such a policy. Among existent studies, Chen and Liu (2007), , and Shih (2008) use questionnaires and interviews to compare the learning motivation of students before and after the implementation of the benchmark policy. Their findings show that the policy has helped students develop a more positive attitude towards English learning, with over 50% of the students affirming the positive effect of the policy. In other words, the students surveyed generally agree that implementation of the policy has increased their motivation for learning English. Huang (2010) found that students show relatively low anxiety towards the benchmark policy. Students enrolled in a school without the policy tend to show a higher anxiety than those who are required to pass the benchmark tests.
At present, there are still relatively few studies focusing on the English graduation benchmark policy, learning motivation, and test anxiety, and since the number of studied samples could be larger, more empirical research is needed to show that the policy helps to promote English learning motivation. Data show that college students in Taiwan spend a total of 1 billion new Taiwan dollars (about USD30 million) every 4 years on taking English standardized tests, but the effectiveness of these tests is limited, and there is no evidence that taking English standardized tests helps to increase students' learning motivation or English communication skills (Her et al. 2013).
This study seeks to contribute to this area of research. By sampling a greater number of students, we hope to better understand how students view this policy. Moreover, since learning motivation is a complicated cognitive factor (Cheng et al. 2014), the English graduation benchmark policy is not only related to learning motivation but also reflects students' attitudes towards English standardized tests, and how they feel about their English proficiency. To provide a more comprehensive analysis of students' views on the English graduation benchmark policy, this study will consider relevant factors such as students' learning motivation, their views on the importance of English tests, test anxiety, test performance, and the inter-relationships among them.

Motivation and performance
Motivation is a complex psychological trait; it is a hidden force within people that drives them to take action. "Learning motivation" can be defined as an internal thought process that triggers learners to achieve specific physiological or psychological goals and devote their effort voluntarily, while maintaining momentum throughout the learning process (Stipek et al. 1995). In general, motivation is considered one of the most important factors in determining a student's second language performance. However, the actual correlation between learning motivation and performance has not been accurately determined, and a precise definition of this construct is still lacking (Dörnyei 2001). Regarding this, Eccles and Wigfield (2002) have summarized many theories on the relationship between motivation and performance, including intrinsic motivation theory, self-determination theory, flow theory, and goals theory. They also integrated theories of expectation and value construction, such as attribution theory, expectancyvalue theory, self-worth theory, and theories integrating motivation and cognition.
Among these theories, a popular topic of research today is self-determination theory (SDT), proposed by Deci and Ryan in 1985. They discussed how when an inherent positive belief persists in an individual, the person will work diligently to achieve their committed goal. In the process, the three inherent needs of human beings-namely autonomy, competence, and psychological relatedness-are fulfilled, which will bring about optimal progress and development for the individual. Ryan and Deci (2000) suggest that both intrinsic and extrinsic motivation can stimulate individual behavior and performance. While intrinsic motivation drives a person to continuously pursue better performance based on his or her interest in the subject and the desire to meet the needs of competence and autonomy, extrinsic motivation is often driven by some actual form of reward, which is considered outside the realm of self-determination, and only through internalization, integration, and regulation does it become part of the selfdetermination process.

Relationships among attitudes, motivation, test anxiety, and test performance
Although there is still considerable controversy surrounding the SDT theory proposed by Ryan and Deci, Cheng et al. (2014) agreed with Ryan and Brown (2005) that compared to other motivation theories, the SDT principle is more applicable to the discussion of the relationship between learning motivation and high-stakes testing. Ryan and Weinstein (2009) used the SDT theory to explain the effect of high-stakes testing on teachers and learners. In high-stakes testing, the motivation for success in learners is not static, but may vary according to people, time, environment, and other factors. Regardless of whether the test-taker has a set purpose for taking the test, the inherently complex nature between the test-taker and the test may influence motivation as well. Ryan and Brown (2005) argued that the policy of test taking is predicated on the grounds of using reward, punishment, and pressure on selfesteem to motivate learning. Noels (2005) suggested that the social and cultural contexts outside the classroom can greatly influence the motivation for learning a second or foreign language. The effect is even more evident when the purposes of the test or the stakes involved vary (e.g., entrance examination, placement test, immigration test). This can lead to different levels of test anxiety and can possibly become a construct-irrelevant variance.
Research confirms that a person's attitude directly influences motivation and behavior (Ajzen 1985;Fishbein and Ajzen 1975). Ajzen (1985) proposed the Theory of Planned Behavior, suggesting that an individual's actions are not only influenced by his or her preference for certain people, things, events, and the subjective norm of society and others but also affected by the individual's desire to complete an action and the perceived behavioral control towards resources and opportunities.
A person's attitude towards specific things and events is also likely to affect motivation and anxiety. Pyun (2013) studied the relationship between task-based language learning and learning motivation and anxiety. The results showed that learner attitude and anxiety show significant negative correlation, while learner attitude and motivation show significant positive correlation. Test anxiety typically surfaces during certain cognitive performances for test takers, such as when they compare themselves with their peers, worry about the consequence of failing a test, experience low self-confidence, or are excessively worried about testing and assessment (In'nami 2006;Liu 2008). Different levels of test anxiety can be attributed to the individual's family background, social environment, and the teaching methods he or she is used to. For example, parents who expect outstanding test results from their children may be putting excessive pressure on them (Bodas and Ollendick 2005). Considering these points of view, test anxiety can negatively impact test performance, impeding test takers from achieving their full potential and strength (Elkhafaifi 2005;Meijer 2001). Many studies have shown a negative correlation between test anxiety and academic performance (Chapell et al. 2005;Ruthig et al. 2004;Putwain 2007).
Test anxiety also affects test takers' performance on foreign language tests, as shown by a study carried out by Tsai (2010). She reported that there was significant negative correlation (γ = −.49) between test anxiety and performance from the results of the listening section of the GEPT Intermediate Level Test. Her conclusion echoes Kunnan's (1995) view that the complex interaction between social and cognitive factors may influence the test results of students in different testing scenarios, depending on differences in test purpose and the stakes involved in the local context.
The study by Cheng et al. (2014) is one of the few examples using the application of SDT to the area of language learning. It was also the first time that empirical research has been used to examine the complex relationships among English learners' motivation, test value, test anxiety, and high-stakes test performance (i.e., test scores) from the perspective of social and cognitive factors. The study invited English learners from Taiwan, China, and Canada to take local high-stakes English proficiency tests, and a questionnaire which explored motivation, test anxiety, and perceptions of test importance and purpose to test-takers in each of the three contexts was conducted immediately after the test. The questionnaire responses were then cross-analyzed with the learners' test scores, and the relationships between English learning motivation and test performance under different social and educational contexts were compared. In the study, the high-stakes test used in Taiwan was the GEPT High-Intermediate Level Test, while the students sampled were university students. The results of the study showed a common phenomenon in all three contexts: the purpose of the test and learners' recognition of the importance of the test can affect motivation. Moreover, as Fig. 1 depicts, since motivation and anxiety exhibit a mutually influential relationship, these factors will affect the learners' overall test performance, along with other personal factors such as the test taker's age and gender (Cheng et al. 2014).
However, Cheng et al. pointed out that these inferences need to be confirmed by further research and different statistical methods. Moreover, the study's "test purpose" was filled in by respondents who completed the questionnaire and not defined specifically as "English graduation benchmark." Authors (2016) used the theoretical framework of Cheng et al., but included questions related to "English graduation benchmark policy" in the questionnaire. The study explored the views of 1620 students from two universities in northern Taiwan on the English graduation benchmark policy, as well as the relationships among learning motivation, test value, test anxiety, and test performance. The structural equation model (SEM) was applied to test the complex relationships among variables. The findings of their study show that university students generally hold a positive attitude towards the English graduation benchmark policy. SEM results show that the attitudes of university students towards the English graduation benchmark policy positively impact their perceived test value and their learning motivation.
Based on the preliminary results, authors (2016) suggest that their study serves the purpose of examining the consequences of using the GEPT as a benchmark test for graduation in the context of Taiwan's higher education. Yet, with regard to one limitation acknowledged by the researchers themselves, their study only used data from students of two universities that have established passing the GEPT High-Intermediate Level Test (equivalent to the CEFR B2) as their English graduation benchmark requirement. They speculated that the sampled students tend to show a stronger learning motivation and more autonomous learning behavior than the university students who have lower levels of English proficiency; thus, their attitudes towards the graduation benchmark policy are also more positive and optimistic. Therefore, they suggest that to confirm the findings of their study, more data needs to be gathered from other universities that use different levels of the GEPT as their graduation benchmark requirement. Although the High-Intermediate group data could be divided into passers and nonpassers in order to examine whether results might vary due to students' English proficiency, we considered it to be more effective if the results between two different GEPT levels could be compared directly because over 30% of the High-Intermediate nonpassers are very likely to pass the GEPT Intermediate based on the results of the GEPT vertical scaling studies (e.g., Wu and Liao 2010). Therefore, following the procedures employed in the previous study which used the GEPT High Intermediate data, the present study collected new data from one university that uses the Intermediate Level of the GEPT as its graduation benchmark requirement. By comparing the results from these two studies that used different levels of the GEPT and involved students who have different levels of English proficiency, we hope to obtain more insights into the context of using the GEPT scores to fulfill the graduation requirement and to justify the consequences of using the GEPT to encourage university students to learn more English. Given that the two studies were conducted by the same researchers and the latter one was carried out on the basis of the former with respect to the validation of the research instrument and the model to explain the relationship among students' attitudes' towards the GEPT, learning motivation, test value, test anxiety, and test performance, this paper will report these two studies as one single study.

Research questions and hypotheses
Based on the literatures previously discussed, two main research questions were addressed: 1. What are students' attitudes towards the English benchmark policy for graduation in Taiwan's universities? Are there differences between the High Intermediate group and the Intermediate group? 2. What are the relationships between students' attitudes towards the graduation benchmark policy, their test performance (i.e., raw scores), and their perceptions of learning motivation, test value, and test anxiety? Are there differences between the High Intermediate group and the Intermediate group?
To help explore the complex relationships between the variables, the following research hypotheses and research model were proposed (Fig. 2). Hypotheses 1-4: The attitudes towards the English graduation benchmark policy have a significant positive effect on test value, learning motivation, and test performance; hypothesis 5: The attitudes towards the English graduation benchmark policy have a significant negative effect on test anxiety; hypotheses 6-9: Test value has a significant positive effect on learning motivation, test performance, and test anxiety; hypotheses 10-11: Learning motivation has a significant positive effect on test performance; hypotheses 12-13: Test anxiety and learning motivation show an interaction effect; hypotheses 14: Test anxiety has a significant negative effect on test performance.

Methods
As mentioned earlier, the study was carried out on the basis of the previous research (Authors 2016) with respect to the validation of the research instrument and the model to explain the relationship among the variables. Although the previous research based on the GEPT High-Intermediate data has been published (in Chinese), for the sake of clarity and coherence, a summary of the research procedures which introduces the research instruments and explains how the best-fit model was established is provided below. As for the other details of the previous study, they are not included here due to text length limitation. Having said so, the key findings of the previous study are discussed and compared with those of the new study which used the GEPT Intermediate data. By doing so, we hope that we can provide more insights when addressing the research questions.

Measurement instruments
Two measurement instruments were used: the General English Proficiency Test (GEPT) and the questionnaire on learning motivation, test value, and test anxiety.

GEPT
The General English Proficiency Test (GEPT) is a five-level criterion-referenced EFL testing system, which targets English learners in Taiwan at all levels, from junior high school upwards. The development of the GEPT was started as an in-house project of the Language Training and Testing Center (LTTC). Later, it was partially funded by Taiwan's Ministry of Education with the aims of promoting life-long learning and introducing positive washback effect on the learning and teaching of English. Since its launch in 2000, the GEPT has been administered independently by the LTTC. Currently, the GEPT is the largest standardized English language test in Taiwan, which is taken by approximately 500,000 test takers at over 100 test sites around the country each year. Numerous evidence of validity has been demonstrated to support the use of the GEPT as a valid indicator of learners' English language proficiency (e.g., Chan et al. 2014;Liao 2016;Weir et al. 2013;Wu 2016). Currently, GEPT scores are not only considered as proof of English ability by government offices, schools, and employers domestically, but also increasingly recognized by universities around the world, including prestigious institutions in Hong Kong, Japan, France, Germany, the UK, and the US, as a means of measuring the English language ability of Taiwanese learners who are interested in pursuing further study overseas.
The test content of the GEPT is not only linked to the local English curriculum but also takes account of local cultural and social references. The levels of the GEPT, which are also linked with the CEFR empirically, are roughly equivalent to CEFR A2-C1 (e.g., Wu 2011).
The GEPT was designed as a skill-based test battery assessing both receptive (listening and reading) and productive (speaking and writing) skills. The test places equal weight on each of the four test components and has general level descriptors and skill-area level descriptors. Each GEPT level is administered in two stages. Test-takers who pass listening and reading (160 out of 240 score points) are allowed to register for speaking and writing (80 out of 100 score points). Those who pass both stages will automatically receive both a score report and a Certificate of General English Proficiency. More details about the GEPT and associated research are available at http://www.gept.org.tw.

The questionnaire
This questionnaire was adapted from the questionnaire used in the study by Cheng et al. (2014). However, with the aim to avoid negative associations students might have with "graduation benchmark" and "test anxiety," the study dispersed questions related to these two dimensions in the section related to test value. Cheng et al. in 2009 studied a sample of 538 test takers of the GEPT High-Intermediate Level Test, and the questionnaire proved to have good reliability and validity. Some of their questionnaire items were included in the questionnaire for this study, for example, "I study English for the satisfaction I gain from learning new things," "I study English for the purpose of finding an ideal job," and "I am under a lot of pressure to get good scores on this test." With a focus on attitudes towards the English graduation benchmark, the questionnaire in the study aimed to reflect students' views on the implementation of the benchmark policy; thus, items such as "I think it is reasonable to require college students to pass the GEPT High-Intermediate Level Test" and "I understand that my university encourages students to take the GEPT in order to enhance their English proficiency" were created. After piloting, the final version of the questionnaire contains 43 questions, of which 11 are related to test value, 8 related to the benchmark policy, 4 related to test anxiety, and 20 related to learning motivation. The questionnaire uses the six-point Likerttype scale. The higher the number on the scale, the more important the students think the test is, the more positive the attitude towards the benchmark policy, the higher the test anxiety, and the stronger the learning motivation. The questionnaire was conducted in Chinese.
Having passed through a set of rigorous validation procedures, we were able to confirm the relevance between the questionnaire items and the latent variables previously established. The factor extraction method with the principal axis factor and the oblimin rotation was used to extract 7 factors with eigenvalues greater than 1, and the cumulative explained variance was 55.53%. But after deleting items that did not meet qualification (a factor loading lower than .4 or items with multiple factor loadings), factor analysis was performed and in the end, 5 factors and 24 items were retained, among which 5 were related to test value, with factor loadings of .74~.89, Cronbach's α of .90; 5 were related to English graduation benchmark, with factor loadings of .54~.72, Cronbach's α of .73; 3 were related to test anxiety, with factor loadings of .78~.86, Cronbach's α of .86; and learning motivation was divided into 6 items on intrinsic motivation, with factor loadings of .52~.84, Cronbach's α of .89 and 5 questions on extrinsic motivation, with factor loadings of .55~.90, Cronbach's α of .84, with the cumulative explained variance at 64.72% (see Table 1).
Next, we used SEM to conduct reliability and validity analysis and to explore possible relationships between the latent variables. SEM can be used to test the theoretical assumptions between observed variables and latent variables and to verify whether hypothetical relationships between observed variables can be supported by empirical data. In general, a model can be established based on previous research, theory, general knowledge, or a hypothesis proposed by the researcher. The correlation-covariance matrix of the estimated model and that of the observed data are then compared, using the chi-squared test to test the difference between the two. The smaller the chi-squared value is, the greater the indication that the data fit the model, meaning the model reflects the observed data more accurately.

Parameter calibration and model-fit analysis
By using statistical software AMOS 23.0, the relationships among attitudes towards the graduation benchmark, test value, test anxiety, learning motivation, and test performance were analyzed. The maximum likelihood estimation (MLE) was applied to calibrate the parameters, and after a series of calibration, model evaluation, and theoretical modification of the model, the final model was established (Fig. 3). Before exploring the relationships among latent variables, we needed to assess whether the model fits the data, which can be determined according to the following indicators. The first indicator is the chi-squared value; if the chi-squared value is not significant, then the model fits the data. However, because the chi-squared value may increase due to the number of samples and the complexity of the model, we also need to examine the goodness-of-fit index (GFI), comparative fit index (CFI), root mean square error of approximation (RMSEA), and standardized root mean square residual (SRMR). When an optimal model needs to be chosen from numerous competitive models, this can be determined by comparing the Akaike Information Criterion (AIC) with the Bayesian Information Criterion (BIC): the smaller the value, the more parsimonious the model.
To achieve a parsimonious model, we removed the paths that did not reach statistical significance from the hypothetical model, and according to the modification index (MI), added a correlation path between the residuals of intrinsic motivation and extrinsic motivation. Intrinsic motivation and extrinsic motivation were originally two sublatent variables of learning motivation; thus, one way of modifying the model was to add a higher-level latent variable (learning motivation) to the two variables, while another possibility was to add a new correlation coefficient between the residuals of latent variables. In an attempt to better understand the impact of the graduation benchmark Note: TV test value, AGP attitudes towards graduation policy, TA test anxiety, IM intrinsic motivation, EM extrinsic motivation on intrinsic and extrinsic motivation, we decided on the latter and modified the model accordingly (Fig. 3). As can be seen from Table 2 We then used the chi-square difference test to test which model has a better fit. The results of chi-square difference test also showed that the modified model was superior to the original model (Δx 2 = 492.37, df = 6, p < .01) and further confirmed that adding a correlation coefficient between intrinsic motivation and extrinsic motivation residuals significantly improved the model fit.

Reliability and validity analysis
After the modified model was established, construct reliability (CR) and average variance extracted (AVE) were calculated using the factor loadings between the latent variables and observed variables. The CR refers to the reliability of the latent variables, while AVE refers to the validity of the latent variables. Fornell and Larcker (1981) suggested that the CR of the latent variables should reach .60, while the AVE should reach .50. It can be seen from Table 3 that the CRs of all the latent variables are above .60, indicating the modified model is acceptable. In terms of AVE, with the exception of the AVE of the attitudes towards the policy being less than .50, the AVEs of all other latent variables are above .50. In practice, it is difficult for the AVE of all latent variables to pass the .50 threshold, as this would mean the factor loadings need to be higher than .71 (.71 squared = .50). For this reason, Fornell and Larcker have proposed that a model with five latent variables can be considered acceptable if among the five latent variables, the AVEs of three to four latent variables are higher than .50, while the AVEs of the other variables are higher than .30 or .40. This indicates that both the questionnaire and the model show good reliability and validity.

Participants
The GEPT Intermediate data were collected in 2016 to confirm the model which has been established based on the results obtained from the study with the High Intermediate data (Authors, 2016). The new data was collected from one university in northern Taiwan that uses the GEPT Intermediate Level Test as its English graduation benchmark. A total of 624 respondents answered the same questionnaire used by the High Intermediate group, of which 570 were valid samples, giving the questionnaire an effective return rate of 91% (282 male, 50%; 288 female, 50%).

Questionnaire responses
Overall The Appendix reports the analyses of both ability groups. To compare the two groups, in "test value," the Intermediate group's perception of the GEPT was significantly more positive than the other group. The same tendency was evident in Note: TV test value, AGP attitudes towards graduation policy, TA test anxiety, IM intrinsic motivation, EM extrinsic motivation "attitudes towards graduation policy." Yet, a reverse pattern was observed in "test anxiety," which means that the Intermediate group felt more pressure than the High-Intermediate group from taking the GEPT. As for "learning motivation," both ability groups were strong, particularly in the aspect of "extrinsic motivation." The following table summarizes the comparison by each variable between the two ability groups (Table 4).

GEPT test scores
In terms of test performance, the total mean score were 154.07 for the Intermediate group. Together with the Intermediate group's test score analysis, the High-Intermediate's is included in the following table. The mean scores of both groups resemble those of the regular GEPT administrations during 2014-2016 (Table 5).

Parameter calibration and model-fit analysis
Given the large sample size and that the data of questionnaire responses and test scores were approximately normally distributed, the maximum likelihood estimation (MLE) was considered appropriate for the study. Figures in Table 6 and 7 show that the modified model which was established with the GEPT High-Intermediate data fits the GEPT Intermediate data well (x 2 = 1055.17, GFI = .90, CFI = .91, RMSEA = .07, SRMR = .05).
Examining the effect of the students' attitudes towards the policy on the various latent variables for the Intermediate group, it can be seen that attitudes towards the policy have a significant positive effect on the perceived test value, with a path coefficient of .53; attitudes towards the policy also have a positive effect on both the extrinsic and intrinsic motivation, with a path coefficient of .62 and .56 respectively. Among the two, the effect on extrinsic motivation is slightly higher than that on intrinsic motivation. There is no significant correlation between attitudes towards the policy and test performance. In terms of test anxiety, results show attitudes towards the policy have a positive effect on test anxiety (β = .21), meaning that students with a more positive attitude towards the policy tend to experience a increased level of anxiety when taking the test.
The interaction effect between test value and other latent variables was also examined: Test value has a positive effect on test anxiety (β = .22), but shows no significant relationship with intrinsic motivation and extrinsic motivation. As for the effect of learning motivation on test performance, results show that only extrinsic motivation has a positive effect (β = .17) on test performance, while there is no significant relationship between intrinsic motivation and test performance. In other words, in order to improve students' performance on English tests, it is important to increase their extrinsic learning motivation. In addition, there is no relationship between test anxiety and learning motivation. Results also show there is a significant negative effect (β = −.29) of test anxiety on test performance. The various relationships among learning motivation, test anxiety, and test performance in this study echo those reported by Cheng et al. in their findings. The path coefficients for both the High-Intermediate and the Intermediate group are shown in the modified model, with the information about the Intermediate group presented in a box (see Fig. 4). The results show that like the GEPT High-Intermediate group, the GEPT Intermediate group perceived the graduation requirement positively and their perception had a positive relationship with test value and learning motivation (both intrinsic and extrinsic). However, four differences between the two sample groups were found. First, the relationships among the variables for the Intermediate group are stronger than those for the High-Intermediate group. Second, unlike the High-Intermediate group, who demonstrated a negative relationship between attitudes towards the graduation requirement and test anxiety, the Intermediate group demonstrated a positive relationship between the two variables. In other words, for students who are less proficient in English, the more positively they perceive the graduation requirement, the greater test anxiety they will feel. Third, an indirect effect of the graduation benchmark on test performance via a different path in each group was observed: intrinsic motivation for the higher group and extrinsic motivation for the lower group.

Conclusions and discussion
This section discusses the results drawn from the two studies corresponding to the research questions.  The findings show that university students, regardless of whether they belonged to the High-Intermediate group or the Intermediate group, generally hold a positive attitude towards the English graduation benchmark policy, which is consistent with the results reported by Chen and Liu (2007), Shih (2008). Results further reveal that the Intermediate group shows more positive attitudes towards the graduation requirement than the High-Intermediate group. In addition, students' attitudes towards the graduation requirement have affected students' perceptions of test value, test performance, and learning motivation in a positive manner. In the aspect of learning motivation, the results of our study show that students' extrinsic motivation is stronger than intrinsic motivation for learning English. A study by Wu and Lin (2009) discussing college students' motivation for English learning showed that "instrumental motivation" scored the highest among different types of motivation. This finding echoes the sociological viewpoint of instrumental motivation proposed by Gardner (1985), which refers to how learners study a language for the purpose of progress and development, such as career advancement or further studies. Items in this study that are related to extrinsic motivation include "I study English for the purpose of finding a better job" and "because I would like to perform better at school or at work." Despite the similarities in students' attitudes towards the graduation requirement between the two groups, results show the Intermediate group's attitudes towards the policy have a positive effect on test anxiety (β = .21), whereas the High-Intermediate group's attitudes towards the policy have a negative effect on test anxiety (β = −.17). This suggests that students with a lower level of English proficiency are more sensitive to the pressure from the graduation requirement and more likely to experience an increased level of anxiety when taking the test. In view of this, schools and teachers should be aware of the risk of increasing students' anxiety when implementing the graduation benchmark requirement.
RQ 2: What are the relationships between students' attitudes towards the graduation benchmark policy, their test performance, and their perceptions of learning motivation, test value, and test anxiety? Are there differences between the High-Intermediate group and the Intermediate group?
SEM results show that the attitudes of university students towards the English graduation benchmark policy positively impact their perceived test value and their learning motivation. However, there is no significant relationship between the attitudes towards the policy and test performance. The attitudes of students towards the English benchmark policy have a positive effect on their motivation for learning English, and this motivation is in turn reflected in their behavior. According to the theory of planned behavior (Ajzen 1985), in order for students to take the initiative to study English and improve their English ability, it is important to change their attitudes towards learning English, making it fun and interesting rather than for the sole purpose of passing the GEPT and meeting the graduation requirement. In addition, the results show that test value has a positive effect on test anxiety. Students who feel that the test results are more important for the purpose of obtaining a degree, a scholarship, or a job are more likely to feel anxious when facing a test. However, the results concerning the relationship between students' attitudes towards the graduation requirement and test anxiety are more complicated. Among the High-Intermediate group, students who had a positive attitude towards the English graduation benchmark were less likely to feel anxious in the face of the test. On the contrary, a positive relationship between students' attitudes towards the graduation requirement and test anxiety was evident among the Intermediate group. In other words, among the Intermediate group, those who perceived the graduation requirement more positively felt more anxious about the test. Regardless of test groups, test anxiety has a negative effect on test performance; students who were more anxious about the test showed poorer performance (Cassady and Johnson 2002). In this regard, we suggest that schools provide relevant resources to help students, especially those students with lower levels of English proficiency, to reduce their test anxiety, for example, inviting the testing organization behind the GEPT to provide more information about how to prepare for the test or give mock tests.
Another noticeable difference between the two ability groups lies in the aspect of motivation. While the Intermediate group students were inclined to be driven by extrinsic motivation, the High-Intermediate group students tended to be more intrinsically motivated. The finding is consistent with Maslow's hierarchy of needs (1943 and 1954) in that before learners can be intrinsically motivated, they must first be provided with external rewards such as good grades. In view of this, the English graduation benchmark requirement can be considered a useful measure to motivate students to learn, especially for those who are at a lower level of English proficiency. But, it is absolutely insufficient for schools to simply impose the graduation requirement upon their students. More importantly, schools must provide necessary remedial course programs to help the less proficient students to build their confidence in improving their English proficiency and at the same time in coping with the graduation benchmark requirement. Different from the Intermediate group students, the High-Intermediate group students may not need a graduation benchmark requirement to encourage them to learn more English because they are willing and eager to learn new material for self-actualization. However, their schools should ensure that students' English learning experience is more meaningful, and they can learn deeper to fully understand it.
Finally, schools should be aware that implementing the English graduation benchmark policy is only one of many ways of assessing students' English ability and not the sole reason to set up English courses. At the same time, we believe universities should consider the differences in students' background, learning objectives, and learning status when establishing the English graduation benchmark policy. Since students possess different qualifications, personal traits, and some may not have kept up with the school's curriculum in their learning process, they should be allowed to set their own expectations and English learning goals according to their ability, resources, and opportunities. In short, although the implementation of the English graduation benchmark policy is based on good intentions, as shown by related studies (Her et al. 2014), applying a fixed standard established by the school or the department may not be suitable for every student.
The questionnaire used in this study was adapted from the questionnaire "Learning Motivation, Test Anxiety and English Test Performance" designed by Cheng et al. (2014). We then compared the result of this study with that of Cheng et al. and attempted to discuss possible reasons for various similarities and differences. Both studies show that the test purpose is positively correlated with learning motivation and test anxiety and that test value is positively correlated with learning motivation. However, the result of our study showed that test value is positively correlated with test anxiety, while Cheng et al. showed that there was no significant correlation between test value and anxiety. This difference may be due to the fact that the latter study analyzed the results of all three tests, namely Taiwan's GEPT, China's CET, Canada's CAEL, as a whole instead of individually. Another reason may be due to the distinct traits of the sampled body. The GEPT samples gathered by Cheng et al. were from students who passed the listening and reading tests and formed a homogeneous ability group, while the present study did not limit the samples to those who passed the listening and reading tests and can be considered a more heterogeneous group.

Limitations
Although the two studies are the first attempt that draws on a large sample of university students to explore the complex relationships among the English graduation benchmark policy, test value, learning motivation, test anxiety, and test performance, it has a number of limitations. First, these studies only used data of students from three universities that have established passing the GEPT High-Intermediate Level Test (equivalent to the CEFR B2) and the GEPT Intermediate Level Test (equivalent to the CEFR B1), as their English graduation benchmark policy. The sampled students, especially the High-Intermediate group, tend to show a strong learning motivation and autonomous learning behavior; thus, their attitudes towards the graduation benchmark policy are also positive and optimistic. Consequently, more data are needed to support whether the findings of this study can be applied to universities that use other forms of English proficiency testing tools as their graduation benchmark requirement. In view of this, we suggest that future researchers can gather data from a large sample of universities that use the other two levels of the GEPT, i.e., elementary (equivalent to the CEFR A2) and advanced (equivalent to the CEFR C1), or other tests (e.g., TOEIC, IELTS) as their graduation benchmark requirement. Second, in addition to using SEM to analyze the relationships among latent variables, hierarchical linear modeling (HLM) can also be applied when there is a larger amount of data. Factors such as the student's department and school can be integrated into the analysis, as these factors are important elements that are likely to influence one's attitude towards the English graduation benchmark policy. Third, open-ended questions should be included in the questionnaire to gain a more comprehensive understanding of the views of university students towards the English graduation benchmark policy, as well as how their English learning process is influenced by the implementation of this policy.