Development of online RAW achievement battery test for primary level

Abstract Achievement test is a mechanism to measure student’s knowledge and abilities. Numerous categories of achievement tests have been developed by different scholars and psychologists. Since they do not directly consider curriculum adopted during the course of study of students, they do not reflect truly upon the achievements of students. We propose an achievement test which is computerized and is based on assessment of RAW (reading, arithmetic and writing) capabilities considering curriculum used for imparting education. We set compositions and contents according to age group and educational standards. We then conduct a series of experiments to show how an achievement test linked with a curriculum is reflective, in a better manner, of the student’s achievement index then a general one. We call Online RAW Achievement Battery test and we also develop an application which use for conducting our experiment and formulation of results. Finally, we analyze our results with students’ historical records and WRAT-4 which is a well-known standardized test and report our findings.


Introduction
Tests are familiarized to judge student's learning capability, performance, and academic level. Tests fall in different categories e.g. Intelligence tests, personality tests, verbal, non-verbal performance tests, aptitude tests, and achievement test. Each testing system needs some kind of general arrangement in order to assess and evaluate student's performance level and also learning disorder through standardized technique. While developing such tests, we define what precisely is to be measured? What types of score translations are required? What test design or blend of arrangements is required for a valid evaluation? What methodology will be utilized i.e. paper-pencil or computerized based (American Educational Research Association, 1999). Achievement tests play an important role in evaluation of learning skills at any level of training. He served at various positions both in industry and educational institutions. He published several research publications in leading international conferences and research journals and has coauthored a book as well.

PUBLIC INTEREST STATEMENT
We propose an achievement test which is computerized and is based on assessment of reading, arithmetic and writing capabilities considering curriculum used for imparting education. We also develop a computer application so that educational psychologist and therapist can easily calculate performance score of students efficiently through computerized way to judge students learning disorder. Our work is significant in the sense that the test is automated and our main focus is to design this achievement for specific age group.
We can measure students' knowledge, performance, and abilities in a standardized and systematic way through achievement tests (Gay, 1996). We can also improve standard of education through them as they provide proper feedback about student performance. However, it is pertinent to mention that most of these tests are based on western education system and do not clearly reflect upon students' achievements in the developing countries.
Standardized achievement tests play a critical role in providing an objective feedback to educators in order to judge how much students learned and understand. Educational institution use assessment to judge learner's performance level. Achievement test provides a snapshot of student's performance, standardized tests serve as a tool to decide educational resources and deliver helpful technique to measure learner progress. We propose an approach to evaluate students' performance considering curriculum taught to effectively help clinicians to judge possible learning disorders. In order to develop achievement test, we select textbooks for selection of questions/items and stores in databank. We work in close collaboration with psychologists to finalize on length of our test, preparation of items and outcomes. In order to validate our proposal, we conduct expertise judgment and pre-testing which we call Test Round No. 1 to validate items. We finally run our system and a well-known system which we call Test Run No. 2 and reflect on the outcomes while concluding our results.
The rest of the paper is organized as follows; the methodology is described in Section 2, implementation is presented in Section 3. Section 3.2 presents validation and evaluation furthermore Section 3.3 of this paper presents "students' performance using RAW" and in last we have presented evaluation of Online RAW in Section 3.5.

Our methodology
We present our methodology for the development of achievement test as explained in Figure 1. In order to develop an achievement test, we first conduct data collection in which we select questions that are fundamental units of tests. We collect data from textbooks of primary level students and we call these questions as "items." An item could be a multiple-choice question, a true/false, short-answer, etc. and we select test items considering books from local as well as international origin (the content of items was taken from commonly taught material which was uniform among all levels of school systems, which was also compatible with international standards of age appropriate level of learning). While selecting items for our achievement test, we consider two subjects i.e. English and Mathematics of primary level and cover two local textbooks and one international curriculum. The local textbooks are approved by the local education boards where the International curriculum is based on Maria Montessori.
Once the books for content is selected, we do content analysis to see if the selection is presenting is truly representative of the complete curriculum and is strong enough to meaningfully judge the ability of their subjects i.e. students (The committee approach was called comprising educational experts i.e. teachers). The finalized selection of contents from selected text books contains the following mixture: We classify test items into selected response questionnaire format and constructed response format. The former are those in which participant has to choose relevant answer from answers list i.e. True/false or multiple-choice questions (MCQ). The latter are questions in which participants present an answer after making some calculation i.e. completion items. We use selected response type questions to judge students learning in English letter reading, English word reading and spellings test, and Oral Mathematics. For assessing mathematical skills, we use constructed response format.

Development of achievement test
In order to develop items, we select a total of 30 textbooks from grade 1 to grade 5 curriculums such that six textbooks of English and mathematics from each grade level. We follow three version textbook styles, textbooks for all grades with detail is mentioned in Table 1.
We assign weights to test contents so that we are able to assign difficulty levels and prepare subtypes. Each grade has a total of 50 marks with reading, arithmetic, and writing tests are of 20, 20 and 10 marks each. These questions include closed item, short question answers, descriptive questions, true-false items, and multiple-choice questions. We define difficulty level for test item as mentioned in Table 2. For item writing and development, we use facet design for matching test items from different books and on the basis of similar items we have selected our questions for test.

Test items validation
After item writing and development phase, we validate test items. First we develop and use a questionnaire to conduct analysis with the help of experts who are trained primary level teachers and psychologists. This helps us to analyze our test items through pre-testing. We then use statistical analysis tools and techniques for validation purpose.
In this section, we develop a survey from 250 test items to check item difficulty and we conclude that out of 250 test items, 26 (10.40%) test items are very easy, 85 test items (34%) are easy, 98 (39.20%) are average, 38 (15.20%) are difficult, and the rest are very difficult as shown in Figure 2.
We accept or reject our test items on the basis of results mentioned above and out of 250 test items 29 were rejected and 221 were in accepted range. We calculate reliability of 250 test items of Online RAW achievement test through SPSS statistics. Our Cronbach's alpha (internal consistency) that we have calculated through SPSS is α = 0.780 as shown in Table 3.

Evaluation
After construction of achievement test i.e. grade-selection, content-analysis, item writing and validation, experts panel assistance through survey, we evaluate achievement test. In this section, we have compare our RAW with WRAT-4 to find which has better result and part (b) further compares both achievement tests (RAW and Wide range achievement test (WRAT)) with historical record. We select primary level students (number of students) through random sampling technique from each grade i.e. (Through grade 1 to grade 5). We check their performance according to standardized rules and criteria that we define earlier and conclude that overall performance of grade 1-5 in all tests in most of tests are above 50% and below 95%. Our students which we have tested fall in average, above average, and excellent category, detailed grade wise percentage is shown in Figure 3.

RAW vs. WRAT-4
We conduct a sequence of experimentations and compare our outcomes with WRAT-4 which is a well-known standardized achievement test (Wilkinson & Robertson, 2006). We select students from classes one to five and we conduct assessment using both tests i.e. RAW and WRAT-4. We compare results of RAW and WRAT to see which is performing better. We examine our result utilizing Pearson correlation of SPSS as shown in Table 4. We use Pearson correlation that ranges from −1 to +1, such that negative values indicate negative correlation and positive values indicate positive correlation as shown in Table 4.
Considering Table 4, it can be seen that zero indicates no relation existing among two groups, whereas Sig. (2-tailed) indicates p-value. Investigation portray that there is a critical and positive relationship exists among two accomplishment tests. Furthermore, N represents sample size. Value of Sign in letter reading is 0.027, 0.343 in word reading, 0.259 in spelling test, and 0.510 in oral mathematics and these all are greater than 0.05 so there exists no relationship among RAW and WRAT in above-mentioned tests but there exist a strong relationship among RAW and WRAT in math's problem test. Performance score of RAW in battery tests is 94% in letter Reading, 88% in word reading, 74% in oral mathematics, 90% in mathematical problems, and 92% in spelling tests though in WRAT students accomplish 86% in letter reading, 72% in word reading, 56% in oral mathematics, and 82% in spelling test which is not as much as RAW score, so our novel accomplishment test which is automated is more effective and very much sorted out than WRAT.  RAW has shown better result over WRAT on the grounds that analysts expected course educational programs of Pakistani reading material and build up this test for particular age bunch i.e. essential level 5-9-year students besides WRAT was developed for all age i.e. 5-94 group to check performance. Performance scores of both achievement tests are shown in Table 5.

Comparison of RAW and WRAT with historical data
As a second means to do evaluation of our proposal, we collect historical performance records of our subjects (students are selected for conducting WRAT-4 tests and for our proposed RAW). The results are presented in Table 6.
For conducting this evaluation, we randomly select students of grade 5 and check their performance using both achievement tests i.e. RAW achievement test and WRAT furthermore compare their performance with previous school records. We observe that the overall performance of RAW is  84%, Historical record is 80.5%, and WRAT is 71%. There is a 3.5% difference among RAW and historical results but 9.5% difference among WRAT-4 and historical record. We present comparison of RAW vs. WRAT in Table 7 which is a 11×3 contingency matrix.

Results and discussion
We find a number of standardized achievement tests but majority of them are not computerized and virtually all of them are region specific. Our work is significant in the sense that the test is automated and our main focus is to design this achievement for specific age group.
In test item validation, we conducted a survey from 250 test items and accept/reject items on the basis of item difficulty level and come to conclusion that out of 250 test items 221 are in acceptable region and 29 are in rejected region. Reliability coefficient is basically the co-relation among two sets of test marks (Cortina, 1993). Reliability can be checked through different ways i.e. Cronbach alpha and Kuder-Richardson-20 formula of reliability and the reliability coefficient that could be computed through this way is called internal consistency of reliability. We test reliability through SPSS software i.e. α = 0.780 as shown in Table 6. First we checked the performance of students through a computerized way i.e. RAW and then also checked the performance of same group of students through paper-pencil assessment way i.e. WRAT. We applied Pearson correlation of SPSS on all the results achieved through both achievement tests and found that there exists a significant difference among RAW and WRAT. We have concluded through evaluation part I i.e. RAW vs. WRAT and found that the performance of letter reading is 94%, 88% in word reading, 74% in oral mathematics, 90% in mathematical problems, and 92% in Spelling test achieved through RAW achievement test, whereas in WRAT test the performance of students in letter reading is 86%, 72% in word reading, 56% in oral mathematics, 72% in mathematical problems, and 82% in spelling test. On the basis of these results, we conclude that RAW score is providing a better reflection of the students' achievements. Perrone (2013) explains "classroom level achievement tests: An essential part of the second language learning and teaching processes" and demonstrates that an achievement test is mainly used in building classroom-level assessments and is particularly intended with reference to course goal and learning objectives that are set according to curriculum, and researchers express that achievement test measures student performance in a particular domain and for particular grade level. The researchers review three main resources that can be followed as a course outline i.e. books, syllabus, and course objectives. Investigator also describes the limitation and benefits of the achievement test as indicated in Perrone (2013). Another similar study (Zunaira Fatima, 2015) has exposed the development and Rasch analysis of an Achievement test for master level in the subject of philosophy  of education. Scholars develop 60 test items and perform item analysis through Rasch analysis technique and refine properties of items using latent continuum and concluded probability of test items utilizing the Rasch model. In 2015, Wiley explains "Comparison of the Gates Reading Survey and the Reading Section of the WRAT" and the major purpose of WRAT is to measure personality in three major areas i.e. spelling, reading, and arithmetic problems to evaluate individual's strengths and weaknesses. The main purpose of the WRAT is to find the relationship between a group reading test and an individual test (Fortenberry & Broome, 2015).

Related work
Researchers adopted a systematized and structured approach to evaluate and present an overview of educational, psychological, and academic profile of the individual (Sax, 1997). Achievement tests i.e. Alabama reading and mathematics test (ARMT) develop in Alabama state to check performance in English word reading and mathematical problems for grade 3-8 level students. Researchers utilizing paper-pencil assessment approach to judge performance level (Kenneth & Meier, 2013). Ohio Achievement test (OAT) is an institutionalized way created in Ohio state and that is intended to judge strengths and weaknesses of understudies and judge English proficiency of English language acquisition for 6-8 grade students and subcategory of Ohio is Ohio graduate testing (OGT) and was intended to check performance in science and social studies of tenth grade students. In 2009, work on writing test was canceled and social science was introduced. Researchers check the performance of students and reading capability utilizing paper appraisal method (Ohio, 2009).
There are numbers of achievement test developed in different states i.e. ARMT in Alabama, SBA in Alaska, AIMS in Arizona, STAR in California, CSAP in Colorado, FCAT in Florida, ISAT in Idaho, MSA in Maryland, MAP in Missouri, MONTCAS in Montana, NMSBA in Mexico, WRAT in Wilmington, NYSTP in New York, OAT in Ohio, and WCAP in Washington, but there is no development of standardized achievement test with battery and application. We develop an achievement test and propose technique for test items development to check performance of students.

Conclusion
We develop an achievement test and name it "Online RAW test" and we create a battery for current test. We also develop an application for this achievement test so that educational psychologist and therapist can easily calculate performance score of students efficiently through a computerized way to judge students learning disorder. We develop test items from primary learner textbooks to check student performance and in addition for course improvement through expert's opinion.
As an outlook, we plan to extend this achievement test for 6 to 10 grade students. We also plan to utilize psychometrical principles and we also intend to add multiple subtests in battery.