Medical student selection process enhanced by improving selection algorithms and changing the focus of interviews in Australia: a descriptive study

Purpose The study investigates the efficacy of new features introduced to the selection process for medical school at the University of New South Wales, Australia: (1) considering the relative ranks rather than scores of the Undergraduate Medicine and Health Sciences Admission Test and Australian Tertiary Admission Rank; (2) structured interview focusing on interpersonal interaction and concerns should the applicants become students; and (3) embracing interviewers’ diverse perspectives. Methods Data from 5 cohorts of students were analyzed, comparing outcomes of the second year in the medicine program of 4 cohorts of the old selection process and 1 of the new process. The main analysis comprised multiple linear regression models for predicting academic, clinical, and professional outcomes, by section tools and demographic variables. Results Selection interview marks from the new interview (512 applicants, 2 interviewers each) were analyzed for inter-rater reliability, which identified a high level of agreement (kappa=0.639). No such analysis was possible for the old interview since it required interviewers to reach a consensus. Multivariate linear regression models utilizing outcomes for 5 cohorts (N=905) revealed that the new selection process was much more effective in predicting academic and clinical achievement in the program (R2=9.4%–17.8% vs. R2=1.5%–8.4%). Conclusion The results suggest that the medical student selection process can be significantly enhanced by employing a non-compensatory selection algorithm; and using a structured interview focusing on interpersonal interaction and concerns should the applicants become students; as well as embracing interviewers’ diverse perspectives.

single tool used within a battery of selection tools [4].
The current study conceived from the need to design a purposely tailored selection process to address the overarching objectives of the Faculty's Selection and Re-admission Committee: selecting applicants who are most suitable for the medicine program; minimizing the number of admitted students who are not suitable; and minimizing the impact of social-demographic biases on the selection outcomes.
Before the design of the new selection process research was undertaken to understand the effectiveness of the tools already used for the selection of UNSW medicine students. The research identified that previous academic achievement measured by the Australian Tertiary Admission Rank (ATAR) has been the best predictor for academic achievement in Australian and New Zealand medical schools [3]. Furthermore, scores in the Undergraduate Medicine and Health Sciences Admission Test (UMAT) correlated positively with academic achievements in the medicine program, but to a lesser extent than ATAR [3,5]. On the other hand, the selection interview which had been used at UNSW since 2004, was ineffective at predicting outcomes in the program [6].
As a result, a new selection process was designed in 2017 (for the incoming 2018 cohort), based on the following principles: First, the selection tools must be based on the best available evidence from research in relevant fields; second, the selection process needs to focus primarily on identifying concerns rather than strengths, since the number of applicants who have significant strengths across the domains assessed far exceeds the number of places available in the program. It is noteworthy that students who struggle or fail in the medicine program are among those who had ranked at the top 8% of the applicants in their cohort. Thus, had concerns been identified earlier, more suitable candidates might have been selected instead; third, the selection should minimize bias against applicants based on their socio-demographic background.
The ATAR marks used for selection include adjustments to accommodate disadvantages due to socio-demographic factors and disability, thus this tool continued to be used in the new selection process. The adjusted ATAR is termed "Universities Admissions Centre (UAC) Rank" and henceforth the term UAC Rank will be used. Similarly, the UMAT continued to be used as the most widely available medical aptitude test in Australia, with an established evidence base [5]. However, the selection interview had been completely changed from a semi-structured interview, which aimed to identify strengths and motivation to study medicine, to a structured interview aimed to elicit potential concerns. The domains of the new interview were determined by faculty academic and clinical staff from a list of specific attributes regarded as im-portant for medical students and doctors; as well as from the literature on the psychology of decision making [7], and design of interview questions [8].
The assumption was that every successful applicant should be able to quickly establish good rapport with any person, should be perceived as trustworthy and reliable, and is expected to effectively interact verbally and non-verbally. The difference between old and new admission processes was described in Supplement 1.
The new selection process's algorithm utilizes a non-compensatory approach [9], whereby performance in each of the 3 selection tools is ranked and the lowest rank among the 3 tools, including within the interview, takes priority for establishing the final selection rank. Thus, a high rank in one tool (e.g., UAC Rank) cannot compensate for a lower rank in another (e.g., interview). Applicants need to perform well across all selection tools to be offered a place in the medicine program [9].

Objectives
The purpose of this study was to compare the efficacy of the new medicine selection process introduced for the incoming cohort in 2018 and the old selection process to estimate the students' competency. This report comprises the results of a series of analyses of selection and assessment data for the 2014-2018 incoming cohorts. The 2014-2017 cohorts were admitted by the old selection process, whereas the 2018 cohort was admitted by the new selection process. The performance of the 2 selection processes in predicting critical assessment outcomes of students completing phase 1 (years 1 and 2) of their medical program is the focus of this study. Although this study uses data from one medical school, its significance is much broader. Identifying "critical features" that can improve the student selection process, in general, may benefit other institutions and other health professions' education programs.

Ethics statement
This study was approved by the UNSW Human Research Ethics Advisory Panel (ref: HC210373). No informed consent was obtained because it is the analysis of the data of the admission process. Data provided for the analysis had been standardized to maintain student anonymity and confidentiality as per ethical requirements. For the new interview inter-rater reliability analysis, the data consisted of the interviewers' marks only.

Study design
It is a descriptive study using administrative data for student sewww.jeehp.org 3 lection of 5 cohorts of students (2014-2018), including one cohort for a new process and the other 4 cohorts for an old process. It was described according to STROBE Statement available from: https://www.strobe-statement.org/.

Setting
Data used for this study were obtained from the administrative data repository of the UNSW Faculty of Medicine & Health. Assessment outcomes of interest were 3 ends of phase 1 assessment outcomes, which are assessed at the end of year 2 in the medicine program (phase 1 clinical and communications skills examination, phase 1 end-of-phase examination, phase 1 portfolio examination). These data pertained to all applicants who secured a place and studied in the program. These data were made available for analysis after all assessment scores were standardized (mean = 50; standard deviation [SD] = 5) to maintain student anonymity. In addition, for assessing the inter-rater reliability of the new selection interview, marks for all applicants who were interviewed for selection in 2018 were used, irrespective of whether the applicants were admitted to the program.

Participants
The data used comprised selection tool marks of applicants who applied to the medicine program at UNSW in the years 2014-2018. All applicants applied for domestic places either by the general or the rural admission pathway.

Variables
Variables included in the dataset are as follows: gender; selection process by cohorts (old selection process, 2014-2017 or new selection process, 2018); selection pathway (metropolitan or rural); UAC Rank, which is the standardized secondary school total assessment score derived from the ATAR; socioeconomic status decile derived from the applicant's reported home postcode; Interview Standardized Score; and UMAT standardized scores assessment outcomes: phase 1 (end of year 2) standardized marks of portfolio exam, clinical & communications skills exam, end of year written exam.

Data sources/measurement
All anonymized data were obtained from the administrative data repository of the UNSW Faculty of Medicine & Health (Dataset 1). There was no measurement tool used.

Bias
No known biases in the data were identified.

Study size
This study used all the available data from 5 cohorts (N = 905). To identify medium effects the minimum sample size required for 6 predictors of the regression analysis is 97 [10].

Statistical methods
Inter-rater reliability of interviewers' judgments (in total 512 interviews) was analyzed for the 2018 incoming cohort. The data were analyzed under 2 different assumptions regarding judgment data: first, the data comprised 4 discrete categories, thus kappa statistics were used; second, the alternative assumption was that the judgment data comprised of an interval 1-4 scale, in which case intraclass correlation coefficient (ICC) was used together with Cronbach's α. The 2018 data were also analyzed to identify the impact of gender on the selection tools, including the interview and the final selection rank, using independent t-tests. Since the selection algorithm generates a rank rather than a score, that rank was normalized to a score between 0-100 (mean = 50, SD = 12.9). It was not possible to compare inter-rater reliability for the old interview since the data included only the score agreed by consensus between the 2 interviewers. The second part of the analysis consists of a series of linear regressions models each model predicts one of the 3 outcomes (phase 1 clinical and communications skills examination, phase 1 end-of-phase examination, phase 1 portfolio examination) by the following predictors: gender; socioeconomic status; selection pathway (rural/non-rural); UAC Rank; mean UMAT score; interview score (for 2014-2017 cohorts the old interview score, for the 2018 cohort the new interview score).

Participants
Participants' mean age was 18.9 years and 895 out of 905 (99%) of interviewees were between 17-22 years of age. Men comprised 47.1% of the population. The distribution of student population by gender did not differ significantly across selection years (chisquare = 5.524, P = 0.238) but overall, there were slightly more women (52.9%) than men (47.1%) ( Table 1). Students' UAC Rank, UMAT, and socioeconomic status also did not have any statistical difference across cohorts ( Table 2).

Quality of the new interview
Data from 512 interviews were available for this analysis. The results demonstrated a good level of inter-rater reliability across the 3 statistics: kappa = 0.639; ICC = 0.895 (absolute agreement); and Cronbach's α = 0.895 (P < 0.001 for all these measures). Demographic data, which was available only for the interviewees who were offered a place in the program, demonstrated that the new interview performed better in terms of gender balance (Fig.  1) and no impact of socioeconomic status interview scores was identified for either the old or the new interview (r = -0.041, P > 0.05 and r = -0.043, P > 0.05, respectively).

Comparison of predictability of the students' performance selected by between old selection process and new selection process
The focus of this study comprises a series of 6 linear regression models estimating the predictability of the 3 key assessment outcomes of phase 1 of the medicine program by the old and new selection processes (Tables 3-5). These results suggest that the new selection process is superior to the old selection process both according to the impact of individual selection tools, as well as by overall impact. Using the variance explained (R 2 ) as a measure of model efficacy, the new selection process yielded R 2 3.13 times higher than the old selection process for predicting phase 1 clinical and communications skills examination outcomes (Table 3, Fig. 2), 2.11 times higher for predicting phase 1 end-of-phase examination marks (Table 4, Fig. 2), and 7.73 times higher for predicting phase 1 portfolio examination results (Table 5, Fig. 2). Comparing the impact of the new and old interviews on these outcomes, demonstrates that the old interview had a negative impact on predicting end of phase 1 examination results and no significant impact on other outcomes (Table 3), whereas the new interview had a significant positive predictive impact on phase 1 a negligible impact on the portfolio scores in both selection processes (Table 4).

Key results
Overall, the results of this study suggest that the new selection process is superior to the old selection process on several accounts. The predictability of all 3 critical outcomes by the new selection process was 2.1-7.7 times higher than the old selection process (Fig. 2), and the new selection process outcomes were not significantly impacted by demographic variables.

Interpretation
The main question is why the new selection process is so much better than the old selection process? It is suggested that the new selection process algorithm plays a key role in its superiority. A recent study that modeled a non-compensatory model for the selection of applicants to interview for medicine at UNSW demonstrated that a non-compensatory model raises the selection threshold for both ATAR and UMAT without significantly impacting the mean socioeconomic status score of the selected group (Table 4) [9]. In practice, the new selection process is more likely to de-select applicants who ranked low on one selection tool despite being ranked highly on the other 2 tools. Such applicants  clinical and communications skills examination results (Table 4). Indeed, this was the largest impact among all other predictors for that outcome (beta = 0.175, P = 0.023) ( Table 4). This pattern of the new selection process having a higher impact compared to the old selection process is consistently shown across all 3 selection tools for all 3 outcomes, except for the UMAT scores which have  were likely to be selected due to the compensatory nature of the old selection process. A non-compensatory selection model takes into account potential concerns because a low rank on any selec-tion tool would ultimately have a greater impact on the chance of being offered a place in the medicine program than in the previous compensatory model. The merit of this concept has already been demonstrated, showing that thresholds can be determined in selection tool scores that predict success in the medicine program. In some instances, these thresholds yielded high (up to 1.20) effect sizes [3].
Since the non-compensatory model did not negatively impact the socioeconomic status distribution of the selected cohorts, it is suggested that medical schools consider adopting such an approach, irrespective of the battery of selection tools employed.
The efficacy of the new selection process compared to the old selection process is also notable. In the old selection process, UAC Rank and UMAT had a relatively small predictive impact on phase 1 end-of-phase examination outcomes (beta = 0.145, P = 0.003 and beta = 0.039, P = 0.345, respectively), whereas the new selection process using the same tools is a much better predictor of outcomes (beta = 0.351, P < 0.001 and beta = 0.139, P = 0.136, respectively). Also, the overall variance explained by the regression model was much higher for the new selection process (R 2 = 17.8%) compared with the old selection process (R 2 = 8.4%). Since the ATAR, UMAT and student demographic characteristics had not changed across the 2014-2018 cohorts, the most plausible reason for the change in the efficacy of the models is the selection algorithm.
The new interview also plays an important role in the new selection process' superiority over the old selection process as can be seen in the regression models ( Table 3). The new interview had the largest impact on phase 1 clinical and communications skills outcomes (beta = 0.175, P = 0.023) among all predictors. By comparison, the old interview had a negligible impact (beta = 0.026, P = 0.527) ( Table 3). This demonstrates that the selection interview is relevant to an important set of clinical skills that are required in the medicine program. Moreover, when it comes to predicting outcomes of phase 1 end-of-phase examination, which represents academic performance, the new interview had no significant impact on that outcome (yet in comparison, the old interview was negatively associated with that outcome: beta = -0.116, P = 0.004). Adding to this the good inter-rater reliability yielded by the new interview (kappa = 0.639; ICC = 0.895; Cronbach's α = 0.895), provides supporting evidence for the validity of the selection interview as it is within the top range of inter-rater reliability reported for such interviews [11]. Thus, the interview is reliable, it demonstrates predictive validity for a relevant outcome and it is not associated with irrelevant outcomes.
When looking at the third outcome, phase 1 portfolio examination (Table 5), the difference between the regression models is even more interesting. First, the variance explained by the old selection process was a negligible 1.5%, whereas for the incoming 2018 cohort the variance explained was 11.6%. The portfolio examination outcome is constructed from students' academic and clinical activities and it emphasizes students' reflection upon their accomplishments and challenges. The main difference between   the new selection process and the old selection process regression models was that for the old selection process the only significant impact on the outcome was gender (beta = 0.090, P = 0.020). In the new selection process regression model socioeconomic status had the largest impact (beta = 0.209, P = 0.029) followed by UAC Rank (beta = 0.172, P = 0.063) and interview (beta = 0.118, P = 0.121). These results further strengthen support for the utility of the new selection process, yet the single demographic impact shown here requires further investigation to ensure that the new selection process is socio-demographically neutral and does not inadvertently discriminate against applicants.

Limitations
The most important limitation of this study is that it employs outcomes that are only 2 years into the medicine program. The original intention was to include assessment data from the end of phase 2 (year 3) in the analysis. However, these data were not usable because learning modes, assessments and grading systems were modified due to the coronavirus disease 2019 pandemic in 2020 and 2021. Consequently, those outcomes could not be compared with related outcomes of the cohort selected by the old selection process. Future analysis is planned to further determine the efficacy of the unique features in the new selection process in predicting advanced outcomes in the medicine program.

Conclusions
This study provides evidence to support the inclusion of important features in the medical student selection process. Using a non-compensatory selection algorithm; using a future-focused interview that aims to identify potential concerns should the applicant become a medical student and later a doctor; and overall making sure the selection process minimizes the selection of less suitable applicants, rather than identifying the top candidates. The top performers will always be selected, but for the benefit of students, educators, and our broader communities, we need to further minimize the number of unsuitable applicants securing a place in health professional programs.