Shake-up in the world of assessment: Impressions from the Ottawa Conference on Assessment from Down Under

No abstract available.


Artificial Intelligence is everywhere
The major breakthrough of generative Artificial Intelligence (AI) resulted from a long process that began with text analytics and rule-based systems initiated in the fifties and the sixties.However, with its newfound availability and accessibility, generative Artificial Intelligence has abruptly shifted trust from our academic research institutions to the whole educational and public sphere: this requires a rethinking of what should be taught and a rapid transformation of our assessment methods, whether we like it or not.
The fact that general-purpose Artificial Intelligence systems, not specifically trained in medicine, successfully pass medical licensing exams [2] tells us more about how we assess than about Artificial Intelligence itself.The values that underpin today's approaches to assessment are finite knowledge (a static chapter or subject at one point in time), highly controlled learners (we test exactly what has been taught), individual performance, a suspicion-based relationship (teachers anticipate that students will cheat and students look on assessment as a tool to make them fail), linear and predictable outcomes, reductionism (we test what is easy to test and condense it all to informationpoor grades).These values are far removed from the professional reality, and make our assessment culture increasingly at odds with the future clinicians we want.Assessment culture should reflect the distributed cognitive system in which students are constantly interacting with peers, teachers and tutors (by the way the Health Professions Artificial Intelligence Tutor already exists) and the whole community, invariably surrounded by generative AI, social media platforms, instant messaging, online encyclopaedias and databases, and so on.Within this general framework, a valid assessment must be meaningful, active and collaborative.
Another aspect of Artificial Intelligence that was discussed was its ability to correct text-based exams, making shortanswer questions and essays realistic for large cohorts.Also, the potential of Artificial Intelligence to evaluate all available data generated by learners during their training could support a review and decision process by a competence committee.

Exam formats
There can be no conference on assessment without a reflection on the two main formats: Multiple-Choice Questions (MCQ) and Objective Structured Clinical Examination (OSCE).They are here to stay in one form or another, but they are losing their exclusive role.

Multiple-choice questions (MCQ)
A closed-book exam where students have to rely only on their "biological" memory, isolated from external sources of information and interactions is completely at odds with what is expected of a clinician who looks up and discusses with peers and experts when uncertain.There is no point in investing a huge amount of resources in an exam that a bot will likely pass with a high score!The focus should shift to how to work with, understand and be sceptical and creative about available data and information.

Objective Structured Clinical Examination (OSCE)
Introduced in the mid-seventies, Objective Structured Clinical Examinations aimed to reduce the number of variables affecting performance assessment by increasing standardisation [3].They are reliable and demonstrate educational impact, but they are also extremely costly.Some see them as an "assessment factory": highly efficient but narrow in scope.Some institutions cancelled their OSCE Dr Bernard Cerutti, MPH University of Geneva Faculty of medicine UDREM Rue Michel-Servet 1 CH-1206 Geneva bernard.cerutti[at]unige.chduring the COVID pandemic and decided not to reintroduce them afterwards.Their rationale: at the end of the day, OSCEs only prepare students to do well in their final OSCE exams.They tend to induce bad habits of scattergun, robotic, formulaic racing and score chasing.Very different from the professional and compassionate patient care we wish for!New formats have been proposed that differ noticeably from the classic OSCE, such as an authentic and reliable adaptation of the Objective Structured Long Examination Record (OSLER) [4] longer than an Objective Structured Clinical Examination, more authentic (sometimes real patients rather than simulated), more time spent on communication skills and approaching the patient as a whole, and a subgroup of examiners more skilled in assessment and feedback; or the Assessment for Progression Exam (APEX) [5] with a flexible timetable, loose timing (no bell), a feedback phase and trained examiners.But whatever the format, the current technological leap is such that this type of assessment should focus much more on dimensions such as communication (often underweighted in the score calculation), case management and interprofessionalism, which is rarely the case.

Entrustable Professional Activities
Entrustable Professional Activities (EPAs) were the subject of much debate, particularly in postgraduate education, where they are now well established.Evidence of their relevance can be seen, for example, in identifying struggling residents early in their training and giving them the support they need to improve and develop without wasting unnecessary years of training.The difficulties of getting to grips with the concept of 'entrustment' were discussed and the different types and approaches of competency committees/entrustment committees were explored.However, the EPA concept seems not always fully understood (e.g. the difference between competency and skills, the necessity of multiple observations from multiple observers in multiple situations, the differences between ad hoc evaluation and entrustment decision), leading to confusion about the soundness of the assessment approach and the reliability of the judgements.

Programmatic assessment
In addition to Artificial Intelligence, Programmatic Assessment (PA) [6,7] was the other main thread of the conference.From a few undergraduate and postgraduate programmes that introduced Programmatic Assessment a few years ago (including the Master of Medicine at the University of Fribourg [8]), the concept is now spreading to many programmes.It is seen as the answer to many problems, including those raised by Artificial Intelligence.But behind the buzzword, principles of PA are not always respected, and the incompatible summative game is still very much dominant.More than just using some assessment formats and labelling assessments as formative, PA is a different ecosystem of values and a different understanding of the roles of teachers and learners.High-stake decisions must be based on triangulation of a rich set of information from multiple sources (e.g.longitudinal data, meaningful feedback on targeted learning activities), which implies a combination across formats, and a narrative rather than a purely numerical process.

Future perspectives for licensing exams
Assessment in medical education is not static and has evolved significantly over the years and decades [9].The expansion of Artificial Intelligence and the promise of a much more comprehensive -covering all domains of competence -and valid assessment offered by a Programmatic Assessment approach challenges the single-shot format of licensing and certifying examinations (typically administered with Multiple-Choice Questions and an Objective Structured Clinical Examination) in terms of validity, efficiency and educational impact.The idea is that, at least, a short-term change in content must be considered (see discussion above on MCQ and OSCE) and, in the longer term, a combination -some even suggest a replacement -with more longitudinal and comprehensive approaches should be pursued.

Conclusion
Assessment is the Curriculum -this statement fully expresses the importance of assessment in education.It underscores the need to think and build meaningful approaches that support the expected learning to best prepare our students and residents for their professional activities.The Ottawa Conference provided a stimulating, sometimes confronting, moment in looking at what the future of assessment might look like.In any case, old assumptions and habits will certainly be shaken by the changes driven by Artificial Intelligence and many other factors.

Potential competing interests
Both authors have completed and submitted the International Committee of Medical Journal Editors form for disclosure of potential conflicts of interest.No potential conflict of interest related to the content of this manuscript was disclosed.