Simulated Computer Adaptive Testing Administration with Remote Proctoring for Oﬀsite Assessment in Mathematics

,


Introduction
The world keeps evolving, having to face changes triggered by various factors.The same is being experienced in the higher education sector owing to the new normal of remote teaching and learning precipitated by the Covid-19 pandemic.The pandemic has led to the shut-down of world economies for close to two years and still with the end of the tunnel far from sight while battling with the third variance of the virus.The educational sector is also having its share of the blow received from the Covid-19 pandemic, with higher institutions permanently shut to avoid physical contact to reduce the spread of the virus.Many higher institutions have explored remote online education to circumvent challenges with the lockdowns, which comes with the need for off-site assessment to ascertain the extent to which learning has been achieved (Woldeab & Brothen, 2019).Remote proctoring is a viable solution powered by relevant technologies to ascertain that online assessments possess the qualities of a good assessment concerning test administration and security (EDUCAUSE Learning Initiative, 2016;Assessment Systems Corporation, 2021).

Literature Review
Mathematics is a compulsory school subject for all students spanning through the basic school level.It is also a prerequisite for matriculation into almost all courses in the Higher education sector.According to Attehet al. (2017), Mathematics is widely regarded as one of the most important school subjects and a central aspect of the school curriculum in every society.The subject of mathematics as a gateway or cross-curricular subject shows its importance in being considered the prime vehicle for developing students' logical thinking and higher-order cognitive skills such as critical thinking, creative thinking, problem-solving, and computational thinking the fourth industrial revolution (4IR).The importance of Mathematics as a subject is also adduced to its relevance and application.Mathematics enables one to make the invisible visible, thereby solving problems that would be impossible otherwise (Michael, 2015).
Mathematics is perceived by society as the foundation for scientific and technological knowledge that is cherished by societies worldwide and seen as an instrument for political, socioeconomic, scientific and technological developments (Hagan et al., 2020).To this end, more periods are allotted for Mathematics lessons throughout the world than any other subject, and with teaching come assessment for gauging the extent to which learning has been achieved.The complexities of factors that can influence Mathematics performance show that high achievement in Mathematics is a function of both cognitive and non-cognitive variables, which has been extensively researched.Despite the benefits that the study of Mathematics offers, many students perceive it to be difficult (Kislenkoet al., 2007;Lucas & Fugitt, 2009;Wasike et al., 2013;Mutodi & Ngirande, 2014); devote inadequate time to self-practice (Michael, 2015;Assen, 2020); are taught by under-qualified teachers (Ogbonnaya, 2007;Sa'ad, 2014); endure under sourced teaching and learning environments (Lillian & Josephat, 2018); resulting in poor performance in the subject.Thus, most students find it difficult to acquire the different mathematical skills and processes that are useful in their everyday lives (Hagan et al., 2020;Assen, 2020).It is commonly accepted that Mathematics is difficult, obscure, and of little interest to certain people (Eshetu et al., 2009).However, a study by Fraser et al. (2004) refuted "the pop-culture notion of widespread math phobia which refers to an American public that is largely intimidated by mathematics", stating that it may hold less truth than is generally believed with empirical evidence.The perception toward learning Mathematics and its implications for Mathematics assessment requires attention from all concerned stakeholders.However, the technology-enabled assessment for diagnosing students' learning problems is sparingly studied.
The importance of the technology-enabled assessment becomes obvious considering the current trends with the Covid-19 pandemic, necessitating Higher Education Institutions (HEIs) adopting fully online modes of teaching and learning.Online assessment is becoming popular during the Covid-19 pandemic.While the Covid-19 Pandemic has its downsides for higher education, the technological revolution is described as a miracle that has enabled the opening up of the possibility of diversifying conventional campus-based HEIs to online delivery modes to reach out to previously inaccessible populations and allowing for higher students' engagement in the teaching and learning process (Ndlovu & Mostert, 2014).However, measurable teaching and learning activity features should belinked with testable student outcomes (Organisation for Economic Co-operation and Development-OECD, 2010).The use of computers has continued to transform educational assessments bringing about remarkable improvements in these assessments, be it teacher-based or standardised.To function in a mathematically literate way in the future, students must have a strong foundation in mathematics.A strong foundation involves much more than the rote application of procedural knowledge and should include conceptual understanding, making sense of, and applying mathematics.In other words, a strong foundation should help students make connections between concepts and see patterns throughout mathematics.Improved assessment technologies can reliably ascertain the strength in this context.
Computer Adaptive Testing (CAT) is one of such improvements as the difficulty of the items is adapted to the performance level of the candidate resulting in enhanced accuracy in ability placement.CAT is an assessment improvement that presents items to candidates adaptively based on their ability levels (Thompson, 2011;Rezaie & Golshan, 2015).As such, CAT is regarded as an innovation in educational assessment suitable for Mathematics as a crosscutting subject and relevant to skills sets required for survival in the 21 st century (Hasanah, 2020).Accuracy in ability estimation, together with increased test security, better control of item exposure, better balancing test content areas for all ability levels, makes CAT suitable for standardized testing (Linacre, 2000).Some administrative advantages of CAT are flexible test management and immediate feedback, which may motivate examinees, the use of ability level appropriate items for testing which may reduce test anxiety, flexible testing schedules, and reduced testing time with research showing that CAT can reduce testing time by 50% resulting in cost-saving benefits (Wise & Kingsbury, 2000;Thompson, 2011;Ogunjimiet al., 2021).Also, the pre-evaluation of CAT performance is of relevance.Measurement precision and test security paramount with off-site CATs are regarded as the most important aspects, especially with highstakes testing (Han, 2018).While these advantages are catchy to any assessment expert, ensuring that the item is not over-exposed guarantees these advantages (Oladele, 2021).
Off-site assessments are taken outside the immediate purview of the invigilators.In this situation, the examinee can take examinations regardless of their geographical locations, made possible by online technologies.Off-site assessments eliminate test administration costs associated with printing, distributing, and scoring additional tests as more applicants complete them Karim et al., 2014).With off-site testing comes remote proctoring, an alternative to on-site test administration.Remote proctoring is conducted using human proctors, also known as supervisors/invigilators.It was first introduced and championed by Kryterion in 2006 and began large-scale operations in 2008.Several other organisations have followed Kryterion's lead.These include software such as Secure, ProctorU, Tegrity, Respondus, ProctorCam, B Virtual, and Loyalist (Foster & Layman, 2013).Remote proctoring allows students to take an assessment at off-site locations while ensuring the integrity of the exam achieved either through a web-based or online service that provides synchronous remote monitoring by a human being also known as proctors or a student recorded video of behaviour during a test (EDUCAUSE Learning Initiative-ELI, 2016).Online proctoring is a form of digital assessment powered by software for students' course participants and examination from any location securely and reliably.Monitoring software, video images and the ability to monitor the student's screen should prevent them from engaging in fraud.Online proctoring usually involves some form of electronic surveillance of the student and the student's computer screen during live examinations, archived for later retrieval (Surf Net, 2020;Winneg, 2020).
Remote proctoring also usually involves some effort to ensure that enrolled examinees are the ones taking the exam.Although a determined cheater can generally find a way to beat any system, online proctoring is seen as a solution to the weaknesses of traditional proctoring with regards to supervisor sponsored cheating during standardised on-site examinations; a commercial venture compounded by the issue of poor remuneration of on-site proctors (ELI, 2016; Davis et al., 2016).Some suggesting on providing adequate controls to discourage cheating from occurring with off-site testing are limiting the availability of an online exam, randomising questions, randomising the order of answers in objective questions, presenting questions one at a time, and allowing limited time for an exam (Cluskeyet al., 2011).Watters et al.(2011) found that students perceive randomisation of questions as the single most effective deterrent to cheating on online exams.These controls are readily available on independent test administration platforms such as FastTest or those integrated into Learning Management Systems such as Blackboard, Google Classrooms, and Desire 2 Learn, among others (Davis et al., 2016;Oladele & Ndlovu, 2021).Implementing these controls are gaining more importance now that the oncampus option for examinations has become slimmer with the enforced Covid-19 pandemic lockdowns in some countries (South African Disaster Management Act., 2020).
Another reason for technology-based alternatives for test administration, such as online proctoring, is the availability of online proctors who are better trained, maybe on career paths and can detect cheating at least as well as on-site proctors.Remote proctoring leverage on technology-based aides offering various means to increase security and prevent fraud such as cameras and microphones, screen capture, which allows the proctor to view the student's screen, lockdown browser used to control access to authorised applications, the ability to stop/start a test, PC logging which allows proctors to see in detail what happens on the student's computer, Keystroke dynamics used to verify examinees identity (Cavon, 2015;Surf Net, 2019).These technologies enhance integrating monitoring into the online testing administration process (Cavon, 2015).Furthermore, standards for certification such as ISO's 17024 that require proctors to be independent of the testing outcomes are often ignored in online proctoring in preference to cost savings, convenience, and resource availability online (Foster & Layman, 2013).
Just like every two-sided coin, online proctoring has both advantages as well as disadvantages.Some disadvantages are stringent testing system requirements, setting up of lockdown and camera placement can be a challenge and intrusive, less secure considering some threats, being perceived as a less secure model may taint a programme, some vendors providing less-than-secure methods/technology, the use of a camera with limited view, delayed review of video recordings, proctors related issues such as the ability to view the test content and being unable to control test session and of course, high cost of some commercial proctoring services, which can be prohibitive for high students enrollments (Cavon, 2015;ELI, 2016).Despite these challenges, remote proctoring offers various solutions tailored for specific situations, such as online graduate programmes, and holding standardised examinations flexibly in timing and location, such as CATs (Surf Net, 2016).Cavon (2015) described technology as a turning point for proctoring a future where humans design a completely automated monitoring system while dealing with security breaches Poskochinov et al., (2018).In administering a remotely proctored test, Professor (2020) recommended running at least one practice test with students to familiarise themselves with the process of accessing and completing a remotely proctored test.While not all off-site assessments use pre-test for examinations conducted in the online environment, sufficient guidelines are given before the live testing session (Human Resources Professionals Association, 2020).This recommendation will allow students to familiarise themselves with the process of accessing and completing a remotely proctored test.Employing test administration procedures that would further validate educational assessment efforts remains an important aspect of the teaching and learning process, relevant for mathematics education.

Theoretical Framework
This study was premised on the 3-Parameter Logistic Model (3-PLM) Item Response Theory (IRT) for dichotomously scored responses.IRT explains an examinee's response to test items via a mathematical function based on their ability (Al-A 'ali, 2006).The theory establishes the interaction level of the examinees with the items in the test, based on the probability of correct responses to an item (Magno, 2009).The 3-PLM IRT model adopted for this study considered the estimates of difficulty (b-parameter), discrimination (a-parameter), and guessing (c-parameter) concerning test administration.

Statement of the Problem
Having moved teaching and learning online, the extent to which learning is being achieved, gauged through off-site assessments, begs for answers.If assessments are not secured while being administered remotely, the whole exercise becomes questionable and faced with the threat of assessment invalidation.An empirical study revealed that remote proctoring decreased cheating and had no direct effect on test performance (Karim et al., 2014).Another study revealed that students' grades were significantly lower for students who were proctored using a remote proctoring service compared to students who were not (Davis et al., 2016).This finding is in congruence with the studies carried out by Prince et al. (2009), who found significant differences in average test grade scores for tests taken electronically without a proctor compared to those administered using a remote proctor overall.A divergent view was presented by Woldeab and Brothen (2019), who reported no significant differences in students' final exam scores when comparing online proctored within-person testing.However, the study uncovered test anxiety resulting in lower scores with students monitored by an online proctor than those who were not.
Also, adequate research and documentation are needed to answer salient questions with CAT, which can also be accomplished using computer simulations (Thompson & Weiss, 2009).Simulation studies are highly recommended for evaluating CAT administration's performance.Given the item pool and simulee distribution, using CAT simulation for determining measurement precision is an important aspect of CAT performance evaluation (Han, 2018).Wang and Kolen (2001) and Gibbons et al. (2008) examined test administration differences for CAT and Paper Pencil tests modes to ensure comparability.While the pandemic necessitated a quick and immediate transition to emergency online teaching and learning with little preparation, simulation studies would help a great deal in closing the gaps of the non-preparedness for assessment to complement emergency remote teaching and learning (ERTL).This study examines remote proctoring as emerging practice for ascertaining the efficacy of ability estimation and validity of off-site assessments regarding test security.This study simulated CAT assessment focus on item administration while varying the option of using pre-test items and how it impacts students' ability estimation and item exposure.The study's general objective is to provide simulated evidence for CAT administration while varying pre-testing options for assessing Mathematics as a gap in the literature.To achieve this objective, the following research questions were raised.

Research Questions
1. How does CAT administration impact ability estimation when students' are exposed to pre-test items or otherwise?2. How does CAT administration impact test security with or without pre-test items?

Methodology
This study adopted a Monte-Carlo simulation approach foradministeringCAT with pretest items implemented following the three components of the conventional CAT item selection algorithm, which test the item selection criterion, item exposure control for a fixed test length of 30 items deployed.One hundred pre-test items were systematically drawn from a pool of 1000 items used for the study.Introducing pre-test items for the study is necessary as standard practice for remote proctoring, allowing students to get familiar with the test (Professor, 2020).The simulation was carried out using SimulCAT (Han, 2012).SimulCAT is appropriate, being a specialised Monte-Carlo based simulation software.Specifications for the simulated study are described in Table 1.Ethical statement: This study was exempted from obtaining informed consent by the Faculty of Education Research Ethics Committee of the University of Johannesburg because the data for the analysis were computer-simulated.

Results and Discussions
Research Question 1: How does CAT administration impact ability estimation when students' are exposed to pre-test items and when not?
Ability estimation was assessed by using descriptive statistics of the mean, standard deviation ofStandard Error of Estimation (SEE) values saved in the *.scaoutput file of SimulCAT as shown in Table 2.As shown in Table 2, the mean SEE for CAT administered with no pre-test items was 0.1979, while the SEE for CAT with pre-test items had a mean value of 0.1977.This result shows that exposing students to pre-test items before commencing the test yielded a higher SEE than when they were not.While Professor (2020) supported giving students' pre-tests with remote proctoring, this may jeopardise ability estimation, especially with high stakes testing and, resultantly, water down the advantages with adaptive tests.This could justify not administering pre-test items with the standardised testing carried out by (Human Resources Professionals Association, 2020).Further analysis was carried out by plotting the Conditional Bias statistics within each θ ranges saved in the *.scaoutput file of SimulCAT as shown in Figures 1 and 2, respectively.
The result shows that Conditional Bias for test administration with pre-test items was highly inconsistent, ranging between 0.057 to -0.18 (See Figure 1).At the same time, a high level of consistency was observed with no pre-test items administered (See Figure 2).Employing CAT administered with no pre-test items for off-site CATs is seen here to guarantee a high level of consistency is required with ability estimation for CAT to function optimally once tightly controlled at a theta level of ±2.This finding is strengthened by the submission of Fraser et al. (2004), who refuted the general opinion that students have a phobia for mathematics.Students' performance in examinations can be guaranteed once teachers prepare students adequately (Wasike et al., 2013) and students engage in adequate self-practice (Michael, 2015;Assen,  ).Adequate self-practice would also go a long way to reducing test anxiety with proctored tests, as reported by Woldeab and Brothen (2019).Furthermore, the desperation to cheat would not arise with ability-level items with CAT administration.This is gains more relevance with online testing where the watchful eyes of physical invigilators are absent.
Research Question 2: How does CAT administration impact test security with or without pre-test items?
Test security was assessed using the item usage values saved in the *.scu output file) of SimulCAT shown in Table 3.As shown in Table 3, the maximum observed item exposure rate with or without pre-test was2461and 2,477 (out of 10,000 test administrations/simulees), respectively.This result shows that administering CAT with no pre-test had a lower item exposure rate signalling a more secured CAT administration being a major goal with remote proctoring (ELI, 2016).This finding is particularly important for off-site CATs where examinees are not directly monitored.Therefore, CAT administration with lower item exposure rates guarantees test security, which would complement remote proctoring efforts such as computer/system lockdowns and keystroke monitoring to validate remotely proctored assessments (Cavon, 2015).The integrity of online examinations through remote proctoring would be further strengthened.

Conclusions and Recommendations
Conclusions drawn from the study are in favour of CAT administration with no pre-test items.With more institutions moving their assessments online, which are rapidly becoming a new normal, there is a need to ensure administrative procedures that would further validate educational assessment efforts in terms of ability estimation and test security.It is therefore recommended that off-site CAT should be administered with no pre-test items to aid diagnosing students' learning problems in mathematics.Also CAT ensures that ability-level items are served examinees.This would also go a long way in reducing students' cheating tendencies and anxiety levels experienced with the subject of mathematics.

Table 1 :
Computerized Adaptive Testing Simulation Design using SimulCAT SimulCAT .scaand .scuoutputs were analysed for answering research questions *

Table 2 :
Test Administration for Ability Estimation

Table 3 :
Test Security with Test Administration options