Reconsidering the Assessment Policy: Practical Use of Liberal Multiple-choice Tests (SAC Method)

Examinees’ performances are assessed using a wide variety of different techniques. Multiple-choice (MC) tests are among the most frequently used ones. Nearly, all standardized achievement tests make use of MC test items and there is a variety of ways to score these tests. The study compares number right and liberal scoring (SAC) methods. Mixed methods sequential explanatory research design was used which consists of both quantitative and qualitative analysis of the data. A test with ten questions was conducted to 73 prospective English teachers who were selected purposively and they were asked why they had chosen more than one option in the second part of the test. Priority was on the quantitative data obtained from the test results. Qualitative data were collected using participants’ explanations for their answers. The analysis of the qualitative findings was used to explain the findings of the quantitative results. The results reveal that liberal scoring method rewards partial knowledge and penalizes blind guessing. It is superior to the conventional scoring methods as it eliminates their disadvantages. Though it eliminates the disadvantages of other scoring methods, liberal scoring method is difficult to be used practically in the classroom. Without a technological help, teachers may find liberal scoring method really difficult. The study also provides teachers with a Microsoft Excel document for practical use of liberal MC tests. With the help of this document, teachers can easily conduct liberal MC tests in their exams. Suggestions to the test designers and policymakers at both national and international levels about the use of liberal scoring method were provided at the end of the study.


Introduction
Testing is an indispensable part of the teaching and learning process. It includes a wide variety of different techniques to assess students' performances. Of these techniques, multiple-choice tests are frequently used by institutions, educators, test designers, teachers, and etc. However, some researchers are really in doubt whether multiple-choice tests properly measure students' true performances or not. Madsen (1983) claims "while multiple-choice tests can be used successfully in testing grammar, they do not seem to work as well in testing conversational ability" (p. 38). Though multiple-choice tests are not good at testing all language skills, they are still being used in many English Proficiency tests at both national and international levels. In Turkey, English proficiency tests such as YDS and YOKDIL (foreign language exams) make use of multiple-choice test items. There are lots of examinees to take these exams, and their exam results must be objectively assessed. Thus, multiple-choice tests are the best choice for standardized achievement tests if the practicality is the main concern. These tests provide "high score reliability, ease of administration and scoring, usefulness in testing varied content, and objective scoring" (Kurz, 1999, p. 3). However, their scoring methods vary, and teachers, test designers, institutions, and educators need to make use of alternative scoring methods for multiplechoice tests.

Literature Review
Multiple-choice questions take many forms. Hughes (1989) points out that "there is a stem and a number of options, one of which is correct, the others being distracters" (p. 59). The question can be given either through an incomplete sentence or through a full question. Despite their advantages, these test items have some disadvantages such as "decreased validity due to guessing and failure to credit partial knowledge" (Kurz, 1999, p. 2). Many scholars provided suggestions to overcome such disadvantages. Different scoring methods for multiple-choice tests have been created so far. Ng and Chan (2009) and Lesage, Valcke, and Sabbe (2013) summarized multiple-choice test methods and listed them as Conventional and Non-conventional scoring methods. While Number right scoring (Kurz, 1999) and Negative marking (Betts, Elder, Hartley, & Trueman, 2009) are among the conventional scoring methods; Liberal multiple-choice (MC) test (Bush, 2001), Elimination testing (Coombs, Miholland, & Womer, 1956), Confidence marking (Gardner-Medvin, 1995), Two-stem multiple-choice question / Permutational multiple-choice question (Farthing, Jones, & McPhee, 1998), Probability Testing and Order-of-preference scheme (Ben-Simon, Budescu, & Nevo, 1997) are among the non-conventional ones.
Nearly all of the multiple-choice tests, including the standardized achievement tests, being applied in Turkey make use of either number right scoring or negative marking. In number right scoring, students are told to pick one of the choices. There is one correct answer, and this scoring method limits students to use their partial knowledge. In this method, unmarked answers and the incorrect answers have a value of zero. Thus, it encourages blind guessing. Correct options consist of the total test score. In negative marking, students are penalized when they mark incorrect options.
Though it is used in order to discourage blind guessing, this type of scoring also discourages students to use their partial knowledge in their exams (Jennings & Bush, 2006).
To reward students' partial knowledge, Bush (2001) made use of liberal / free-choice MC tests. His team made use of four-answer questions in their tests. Later, they shared their experiences of using liberal MC tests. Jennings and Bush (2006) presented the comparison of the conventional number right scoring method and liberal / free-choice scoring method theoretically. In this study, it was aimed to see the differences between the two scoring methods in practical classroom use, and find answers to the following research questions: 1. What are the differences between number right and liberal scoring methods? 2. Does liberal scoring method reward partial knowledge and prevent blind guessing?

Methodology
Mixed methods sequential explanatory research design was used which consists of both quantitative and qualitative analysis of the data (Creswell, 2014 Ivankova et al., 2006, p. 16) In the first part (quantitative) of the study, students' answers to the questions were presented using frequencies as a result of which the difference between the two scoring methods was explained.
The second phase (qualitative) of the study helped explain, or elaborate on the quantitative results obtained in the first phase (Ivankova, Creswell, and Stick, 2006). The researcher collected the quantitative and qualitative data together, but analyzed the quantitative ones first because priority was on the quantitative data. Later, the analysis of the qualitative findings was used to explain and interpret the results of the quantitative data. The quantitative and qualitative results were integrated in the conclusion and implications part (Creswell, 2014).

Data Collection Tool
In this mixed research method, the researcher conducted a test with ten questions as the primary data collection tool. The questions had been previously asked at the standardized achievement tests such as KPDS (foreign language proficiency examination for state employees), YDS (foreign language exam), and UDS (interuniversity council foreign language exam) in Turkey. As the students' answers to the questions are limited in the representation of the whole picture, the second part of the answer sheet aimed to explain why students have chosen more than one answer in the "Scoring All Choices" part. They were asked to explain how they have found the correct answer in the open-ended part of the answer sheet. The researcher and two other instructors chose the questions and formed an answersheet to collect the data (See Appendix A).

Validity and Trustworthiness
Not only for quantitative studies but also for qualitative ones, the major concerns are the accuracy of the findings and the correct interpretation of the data (Creswell, 1998 Interpretation and explanation of the quantitative and qualitative results

Conclusion
Implications principles suggested by Creswell and Miller (2000). The researcher made use of investigator triangualation and peer debriefing (Cresswell & Miller, 2000). One more coder helped the researcher during the analysis of the data. The other coder was trained for the assessment procedure of the data collection tool. The coder and the researcher analyzed the data separately. Later, they came together and discussed whether there were any inconsistencies or not and reached full-concensus. Moreover, the researcher consulted an expert in the field of Evaluation and Measurement about the face validity of the answer sheet which was used to collect the data.

Research Context
In this study, both number right scoring and liberal scoring methods were used in one test. In the first part (Number right scoring), 1 mark is awarded for a correctly chosen option while 0 marks are awarded for an incorrectly chosen one. In the second part (Liberal scoring), in a question with N options and one correct answer, 1 mark is awarded for a single correctly chosen option and -1/(N-1) for each incorrect one (Bush, 2001;Jennings & Bush, 2006;Warwick, Bush, & Jennings, 2010). The formula for the scoring method can be difficult to figure out for some, especially for teachers aiming to use this method in their classes. Therefore, the following figure makes it clearer to characterize students' level of knowledge for a given answer considering Bradbard, Parker, and Stone's (2004) classification. To simply explain the scoring method, suppose that the teacher asks 25 multiple-choice questions in a test and gives 4 points for each question which makes 100 in total. As in questions 5 and 6 in the following figure, students leaving all the alternatives unmarked (0 points) gets the same score when compared to students marking all the alternatives [+4 points (1 correct option) -4 points (4 incorrect options) = 0 points]. As students are allowed to mark all the alternatives and the teachers are supposed to score all choices, I named the method 'Scoring All Choices' (SAC) method (Cesur, 2009).
The name "SAC method" is used in place of liberal (free-choice) scoring method in this study.

Correct Option Question
None of the three is correct Moreover, in question 4, the student does not know the correct answer, but the only thing he or she knows is that 'E' is not the correct one. Therefore, he or she leaves that option unmarked and gets 1 point out of 4 for that partial knowledge. As for preventing blind guessing, this method works well as it penalizes the incorrectly marked options. As in the 7-10 th questions in the figure above, students get -1 point for each incorrect option. If they do not know the answer for sure, they need to leave the options unmarked. Otherwise, they will lose a point for each incorrect answer.

Participants
The researcher made use of purposive sampling which means he chose specific people with a purpose in mind (Ritchie, Lewis, & Elam, 2003 Therefore, the participants were informed about the SAC method for 6 lesson hours. Before they actually had the exam for the main study, they had taken 3 different quizzes that were scored using the SAC method. Their answers were discussed and they got familiar with the scoring method.
Still, 3 students either did not understand or misunderstood the scoring method in the main study.
Therefore, their answers were omitted and 70 prospective teachers' answers to 10 different questions were analyzed.

Data Collection and Analysis
Data were collected from 70 prospective teachers of English and 700 questions in total were analyzed to see the difference between the two scoring methods. In the first part of the data collection instrument (See Appendix A), participants gave answers to the questions in two different columns; one was for number right scoring, the other one was for liberal (SAC method) scoring. In the second part, they explained the reason why they had chosen more than one option in the SAC method. Their answers to the questions were transcribed into the Microsoft Excel worksheet. The quantitative data obtained from the test were analyzed by the use of descriptive statistics considering the different knowledge levels of the participants. Each knowledge level of the participants was analyzed. The Frequencies of the answers given to the questions provided a valuable source to compare two different scoring methods. To analyze the qualitative data, Strauss and Corbin's (1998) coding stages were followed. Codes and subcodes were created from the transcriptions of the answers given to the openended questions. These codes and subcodes were then grouped into a number of categories and themes considering the classification of knowledge levels (Bradbard et al., 2004) of the participants. Finally, they were analyzed and used to explain why the participants chose more than one option to find the correct answer.

Differences between Two Scoring Methods
In general, the SAC method was not student-friendly as the questions were difficult for the participants. Similar to what Bush (2001) experienced, "the better students understood and mostly liked the new test format, while poor students strongly disliked it" (p. 161). When their total score is examined, it can be seen that only 11 of 70 participants got higher scores in the SAC method than they did in number right scoring. 11 students were among the ones who got higher marks in number right scoring method too. Those who are not sure of the correct answer or who do not know it got fewer points when compared to the ones they obtained from number right scoring. For example, the most successful student got 34 out of 40 in SAC method while the least successful one got -6. The most successful student got 2 more points in SAC method when compared with the score he did from number right scoring (32 points). However, the least successful student got -10 points less than he did from number right scoring (4 points). Bush (2001) was completely right in his argument that in liberal MC tests "the difference in test scores between the best and worst student can be very wide" (p. 162). points in number right scoring. However, in 198 of them, they were sure of the answer. Participants lost 1, 2 or 3 points just because they were not sure of the correct answer and chose more than one options together. Moreover, in number right scoring, participants get 0 points for the absence of knowledge or misinformation. While in 368 questions participants got 0 points in number right scoring, only in 24 questions they got the same score in the SAC method. This means that they really do not know the answer to 24 questions of 368. In the rest 344 questions, they were either rewarded for their partial knowledge or penalized for the misinformation they had.

Partial Knowledge and Blind Guessing
To see the number of questions in which participants' partial knowledge was rewarded and the misinformation they had was penalized, the questions were analyzed one by one. The following Table   shows the frequencies of the scores they got from number right scoring and SAC method.

Total 238
The Number of Questions It is clearly seen in the Table that SAC method not only rewarded partial knowledge  These expressions clearly show that participants lose points when they are not sure of the answer (19.14 percent of the questions) or when they do not know the answer (34 percent of the questions).
Nearly in 53 percent of the questions participants lost points. Therefore, in our case, SAC method (Liberal MC Tests) both penalized blind guessing and rewarded partial knowledge. However, the number of the points that were penalized for the misinformation the participants had was more than the ones that were rewarded for their partial knowledge.

Conclusion and Implications
Variety of ways can be used to score multiple-choice tests. There have been lots of research studies mainly focusing on the comparison of conventional scoring methods. The study compares number right and liberal scoring (SAC) methods using a test with 10 multiple-choice questions. As the number of the participants is limited, the findings of this study may not be applicable to interpret the patterns in other research settings. "What appears to work well in one setting does not in another or in a replication" (Frary, 1989, p. 92). The results are specific to this study and reveal that liberal scoring method rewards partial knowledge and discourages blind guessing. The method eliminates the effects of students' lucky guesses and encourages students to use their partial knowledge. There is no best scoring method in testing students' performance; however, this scoring method is superior to the conventional ones as it eliminates their disadvantages. In the case of partial knowledge, "liberal/freechoice tests are more generous than conventional tests" (Jennings & Bush, 2006, p. 4). The participants got higher scores for their partial knowledge. Bush (2001) proposes liberal MC tests as the best method for anyone wishing to use MC tests to assess examinees' partial knowledge. However, the participants lost more points due to blind guessing. They should have been informed more about penalizing blind guessing so that they would not have gambled. Since "the extent to which they are punished increases with the amount of 'misinformation' per question" (Jennings & Bush, 2006, p. 5), they had better leave the questions that they do not know anything about unmarked not to lose any more points.
It is easy for the students to cheat in conventional multiple-choice tests as there are only four or five options and a single correct answer. Students not only can see the answer sheets of their friends sitting nearby, but also can communicate with their friends nonverbally to give the correct answer (Hughes, 1989;Madsen, 1983). It was observed that this is not the case in liberal multiple-choice tests.
As the students can mark as many choices as they want, there will be more options marked. Thus, no matter how clear a student sees his/her friend's answer sheet, he/she will not be able to detect which answer is correct or which one is wrong. Besides taking blind guesses, cheating to find the correct answers is quite difficult in this test format.
Though they are more advantageous, liberal MC tests have some limitations. For instance, applying these tests could be very demanding especially in terms of instructing examinees. As the examinees are new to the scoring method, there can be some problems regarding the reliability of liberal MC tests. In this case, the liberal MC test did not prevent blind guessing but penalized it. When examinees get used to the method and informed well about it, they will be more careful about blind guessing. As they lose more and more points, they will not mark any options if they do not have any idea about it. Another disadvantage of using liberal scoring method is that it is really difficult to score each question without any technological help. It can be really confusing for teachers to give 1 point for each correct option and -1/(N-1) point for each incorrect one. To solve this problem, Microsoft Excel document (See Appendix B) was provided for practical use of liberal multiple-choice tests (SAC method). Teachers can make use of this document and easily apply liberal MC tests in their classes. If a large volume of data has to be collected and processed within a short period of time, Optical Mark Recognition (OMR) is one of the fastest and safest methods for data entry. The examinees' answers to the test questions can be collected using OMR reader. Then, the data can be transferred into the Excel Document and their scores can easily be measured. Solving all these problems, the test designers and policymakers at both national and international levels may make new decisions about using liberal MC tests in standardized achievement tests.