The relationship of individual examinee characteristics and acceptability of smart device-based testing to test score in the practice test of the Korea Emergency Medicine Technician Licensing Examination.

PURPOSE
Smart device-based testing (SBT) began to be introduced in the Republic of Korea's high-stakes examination system, starting with the Korea Emergency Medicine Technician Licensing Examination (KEMTLE) in December 2017. In order to minimize how variation in examinees' environment may affect the test score, it aimed to identify any variables related to individual characteristics and acceptability that are related to the examinees' test scores in the SBT practice test.


METHODS
Out of 569 candidate students who were administered the KEMTLE on September 12, 2015, 560 students responded to the survey questionnaire on the acceptability of SBT after the examination. The questionnaire addressed 8 individual characteristics: 2 satisfaction, 9 convenience, and 9 preference items. Comparison analysis between individual variables was performed. Furthermore, generalized linear model (GLM) analysis to find the effect of individual characteristics and acceptability of SBT on test score were conducted.


RESULTS
Among those who preferred SBT compared to paper-and-pencil testing, test scores were higher for male participants (M=4.36, SD=0.72) than for female participants (M=4.21, SD=0.73). According to GLM, no variables evaluated, including gender, experiences of CBT(Computer based test), SBT, and using a tablet PC, were statistically significantly different by total score, scores on multimedia items, or scores on text items.


CONCLUSION
The variables of individual characteristics and acceptability of SBT did not affect the SBT practice test scores of emergency medicine technician students in Korea. Adoption of SBT for the KEMTLE should be possible to execute without interference from the variables examined in this study.


Introduction
Computer-based testing (CBT) has been successfully used for high-stakes medical health licensing examinations in the United States, Canada, and Taiwan. In the Republic of Korea, 24 medical health licensing examinations are managed by the Korea Health Personnel Licensing Examination Institute (KHPLEI). The KHPLEI decided to introduce CBT for the Korean Emergency Medical Technician Licensing Examination (KEMTLE), which is one of the 24 medical health licensing exams managed by the KHPLEI, starting in late 2017 [1,2]. The KEMTLE is the first professional licensing examination that will use SBT in Korea.
The KHPLEI began to administer CBT practice tests in 2014, and decided to introduce smart device-based testing (SBT), which involves the use of a tablet PC instead of a desktop PC. A tablet PC was chosen to avoid placing limitations on testing locations and the number of examinees. If a desktop PC is used for the exam, specifically equipped test centers would be needed, and the number of desktop PCs at the test center would limit the number of examinees. In contrast, using a tablet PC for the exam increases flexibility in the eISSN: 1975-5937 Open Access The relationship of examinees' individual characteristics and perceived acceptability of smart device-based testing to test scores on the practice test of the Korea Emergency Medicine Technician Licensing Examination testing locations, and enables the administration of as many exams as the KHPLEI provides tablet PCs for. Therefore, in this report, we use the term SBT instead of CBT. Based on the results of the practice test scores and the questionnaire on examinees' perceived acceptability of SBT, it may be possible to identify individual characteristics and acceptability-related variables that affect test scores. If such variables are found, we would need to make an effort to minimize their effects in order to achieve comparability between SBT scores and conventional test scores.
In a recent study on SBT in Korea, satisfaction with, convenience of, and preference for SBT compared to paper-and-pencil testing were sufficient to determine that administering SBT was worthwhile [3]. In a focus group interview after CBT at a medical school in Korea, CBT was reported to be good for student learning because it strengthened the clinical context [4]. In another study, experience with computers and anxiety about computers did not affect the CBT test scores of health professions students [5]. In medical school in the United States, content familiarity was found to be related to differences in performance, but not gender, competitiveness, or familiarity with computers [6]. Although some evidence suggests that individual characteristics might affect CBT test scores, more extensive research is needed on the impacts of those characteristics and the perceived acceptability of SBT on SBT test scores. Therefore, we aimed to determine whether individual characteristics and perceived acceptability affected the test scores of examinees on the KEMTLE practice test using SBT. Specifically, we investigated whether individual characteristics affected the perceived acceptability of SBT and whether individual characteristics and perceived acceptability affected the test scores. The acceptability variables consisted of 3 subcategories: satisfaction with, convenience of, and preference for SBT. The null hypotheses of this study were as follows: first, variables relating to individual characteristics would not affect perceived acceptability; and second, variables relating to individual characteristics and perceived acceptability would not affect examinees' test scores.

Ethics approval
Students participated in the survey after providing written informed consent. This study was approved by the Institutional Review Board of Hallym University (HIRB-2015-092).

Study design
The study had an observational design based on test results and a questionnaire survey. A generalized linear model (GLM) analysis was conducted to evaluate the effects of individual characteristics and perceived acceptability of SBT on test scores.

Setting
The SBT KEMTLE practice test and questionnaire were administered to 569 candidate students (examinees) at the same sitting on September 12, 2015 in Daejon, Korea. A smart device (a 10-inch tablet PC) was distributed to each examinee, and they marked their responses on the screen of the device. The test items consisted of 50 multimedia items and 80 text items. They were given 120 minutes to complete the examination. All items contained 5 options with 1 best answer. All 569 examinees who were present took the examination; and 560 students responded to the questionnaire on the acceptability of SBT after the examination. The original questionnaires consisted of 8 items regarding individual characteristics, as well as 2 satisfaction, 13 convenience, and 16 preference items (Supplement 1), but based on the results of exploratory factor analysis, 9 convenience and 9 preference items were selected for this study. Items were scored on a 5-point Likert scales (1, strongly disagree; 2, disagree; 3, neutral; 4, agree; 5, strongly disagree). The questionnaire was also administered on the tablet PC. The exam and questionnaire were not internet-based; instead, stand-alone tablet-based testing was used. After the examination and survey, the data in the tablet PCs were moved to a separate location and the responses were transferred to a server. The collected data comprised the test scores of the examinees (Supplement 1) and their responses to the survey questionnaire. Fig. 1 presents a diagram of the study process.

Participants
A total of 569 examinees were included from the 41 emergency medicine technician schools in Korea, who were arbitrarily selected to be administered the practice test and questionnaire on the perceived acceptability of SBT. They were in their final year of study (i.e., third-year students from 3-year programs or fourth-year students from 4-year programs). The total annual enrollment in the 41 schools was 1,400 based on a national regulation; therefore, the 569 participants corresponded to 40.6% of the target population. The characteristics of the participants are presented in greater detail in Table 1. Of the 569 subjects who took the examination, 560 participated in the questionnaire survey. The validity test was conducted using responses from 162 students, and responses from the other 398 students were used for the null test.

Variables
The variables related to individual characteristics and perceived acceptability of SBT are listed in Tables 1-4. The examinees' test scores were considered to be the outcome. The variables for individual characteristics were treated as dichotomous values. The variables for acceptability were on a 5-point Likert scale. Test scores were a continuous variable.

Data sources/measurement
The source of all variables was response data from the survey questionnaire. The measurement methods were exploratory factor analysis for validity, the Cronbach alpha for reliability of the survey items  It was convenient to check the items that were not solved before submitting the answers.

Bias
There was no noteworthy source of bias in data collection or analysis. Nine of the 569 examinees did not respond to the acceptability questionnaire after SBT; this was low enough to have a negligible influence on the analysis.

Study size
The sample size (N = 569) corresponded to 40.6% of the total target student population, and examinees were drawn from 100% of the 41 emergency medicine technician schools; therefore, the sample size in this study was sufficient for the statistical analysis to be representative of the student population.

Quantitative variables
All variables were quantitative. They were subjected to a parametric analysis.

Statistical methods
Three procedures were conducted to test 2 null hypotheses. First, the survey questionnaire on the acceptability of SBT was validated and its reliability was confirmed; second, t-test analyses were performed to evaluate relationships between individual variables and perceived acceptability of SBT; and third, a GLM analysis was conducted to evaluate the effects of individual characteristics and perceived acceptability of SBT on test scores.
To confirm the validity of the questionnaire on the acceptability of SBT, exploratory factor analysis was conducted with the principal axis for the factor extraction method and varimax for factor rotation with 162 examinees. A total of 560 subjects were arbitrarily divided into 2 groups for survey validation (N= 162) and analysis using the t-test and GLM (N= 398). Reliability was assessed using the Cronbach alpha.
To test the null hypotheses, t-test analyses were performed with the results of the questionnaire on the acceptability of SBT and test scores according to the background variables of gender, age, type of university, and experience with CAT, SBT, and use of a tablet PC. Test scores on the KEMTLE were used as the dependent variable. The KEMTLE used for the practice test was composed of 130 items, including multimedia items and text items.
To determine the effect of individual characteristics and perceived acceptability of SBT on test scores, 3 different GLM models were analyzed using 3 different sets of test scores as dependent variables, with the same independent variables that were analyzed using the ttest. More specifically, GLM analyses were conducted of test scores on all 130 items (total scores), test scores on the 50 multimedia items, and test scores on the 80 text items. For this study, 14 variables were available: 6 categorical variables related to individual background characteristics, and 5 factors from the questionnaire regarding perceived acceptability of SBT and the 3 different types of test scores. The factors relating to perceived acceptability of SBT and the test scores were continuous variables. For the GLM analyses, examinees' characteristics, which were used as independent variables, were selected based on the t-test results. Furthermore, 3 composites derived from the questionnaire on the acceptability of SBT were employed as independent variables (satisfaction with SBT, convenience of each of two SBT features, item solving, and the interface), as well as 2 factors related to preferences for SBT compared to paperand-pencil testing and compared to CBT. SAS ver. 9.4 (SAS Institute Inc., Cary, NC, USA.) was used for the analysis.

Results
Descriptive data of participants Table 1 shows the number of examinees who responded to the survey questionnaire based on their background, subdivided according whether their responses were used for survey validation or the ttest and GLM analysis.

Outcome
The outcomes of this study were 6 variables related to individual characteristics, their perceived acceptability of SBT, and 3 sets of test Validity and reliability of the acceptability questionnaire Tables 2 and 3 present the results of exploratory factor analysis of the scale for the convenience of SBT features and the scale for preferences for SBT, respectively.
In addition to these 2 scales, overall satisfaction with using SBT was included in the SBT evaluation survey. The survey was composed of 3 scales: a scale for satisfaction with SBT (2 items), the scale for the convenience of SBT features (9 items), and the scale for preferences for SBT (9 items). The scale for the convenience of SBT features was composed of 2 factors (convenience related to item-solving, and convenience related to the user interface). The scale for preferences for SBT was also composed of 2 factors (preference for SBT compared to CBT and preference for SBT compared to paperand-pencil testing). Table 4 shows the description, the number of items, and the Cronbach alpha coefficient of each scale in the SBT evaluation survey. The range of reliability of scales and factors in each scale for the evaluation survey was 0.836 (convenience of SBT features relative to computer-based test) to 0.920 (preference for SBT). All scales and the factors in each scale showed strong internal consistency and a high level of reliability. Table 5 presents the descriptive statistics of the 8 variables related to test scores and the perceived acceptability of SBT.     ables and their t-test results. The mean results of the evaluation survey by each background category were higher than 3.81 (the mean score for satisfaction with SBT among examinees who had no experience of using a tablet PC) and examinees had high values of satisfaction with SBT, convenience of SBT features, and preference for   Values are presented as mean ± standard deviation.       Furthermore, the mean score for preference for SBT compared to paper-and-pencil testing among male participants (mean ± SD, 4.36± 0.72) was higher than among female participants (mean± SD, 4.21 ± 0.73). The gender difference in preferences for SBT might have reflected gender differences in adaptability and favorable attitudes to using new information technology. Thus, for the GLM analyses, we needed to confirm whether preferences for SBT or gender affected test scores. Tables 11-16 show the means and standard deviations of test scores by background variables and their t-test results. No statistically significant relationships were found for any background variables. The mean differences between categories of each background variable were small; for example, the difference between the total mean scores of males (mean ± SD, 78.13 ± 14.008) and those of females (mean± SD, 77.02± 14.45) was 1.11.

Effects of independent variables on test scores
Based on the t-test results, gender was the independent variable that showed a significant association with preference for SBT (t = 2.132, df = 396) and preference for SBT compared to paper-andpencil testing (t= 2.076, df= 396). Gender was included in the GLM analysis, and age and type of university were excluded. Experiences of CBT, SBT, and using a tablet PC were included in the models because they were closely related to the test methods. Table 17 shows an analysis of variance (ANOVA) summary table for total scores, scores on multimedia items, and scores on text items. The R 2 values of the ANOVA model of the dependent variables were 0.024, 0.023, and 0.024, and the independent variables explained about 2% of the variation in each dependent variable. Table 18 shows the regression coefficients and the values for statistical significance; no variables showed a statistically significant relationship with test scores. Furthermore, the η 2 values of independent variables were small, indicating that the effect sizes of the independent variables were small.

Key results
Our main results are as follows. First, the variables related to individual characteristics did not affect the perceived acceptability of SBT by emergency medicine technician students in Korea who took the KEMTLE practice examination, except for effects of gender on preferences for SBT in general and preference for SBT compared to paper-and-pencil testing. Second, the variables related to individual characteristics, satisfaction with SBT, and convenience of SBT did not affect the test scores on the KEMTLE practice examination. The null hypothesis was not rejected; therefore, the adoption of SBT for the KEMTLE should not be a problem for emergency medicine technician students in Korea.

Limitations
A limitation of this study is that a comparability study between paper-and-pencil tests and SBT was not conducted. However, doing so would be difficult because multimedia items cannot be included in a paper-and-pencil test, and the scores of SBT including multimedia items cannot be compared directly with paper-and-pencil test scores.

Interpretation
Proficiency or experience with the test device may be a major discriminating factor that could affect the validity of the test. Our results showed no difference in the perceptions of SBT according to experience with SBT or CBT and experience of use of smart devices. We also looked into whether test scores varied according to perceptions and experience with SBT or CBT or use of smart devices. We did not find any significant differences in test scores depending on experience with CBT or SBT. The average SBT exam scores of examinees with experience of CBT and those with no experience were 79.00 and 76.74, respectively. The scores of examinees with and without SBT experience were 78.21 and 77.51, respectively. The average test score of those with experience using smart devices was 77.6; while that of those who were not current users was 71.3. The Experience using a smart device seems to have influenced the test score. However, very few students did not have experience using smart devices (18; 4.5% of all participants), so the results for experience with use of smart devices should be interpreted with care.

Generalizability
The number of subjects eligible for this study was 1,400 from 41 emergency medicine technician schools. Of these students, 569 were selected for SBT and 560 (98.4%) responded to the questionnaire survey; therefore, the sample of this study can reasonably be considered representative of the total population of the emergency medicine technician students.

Conclusion
Two null hypotheses of this study were accepted. SBT can be adopted for the KEMTLE without difficulties arising from the variables examined in this study.