Quantification of Entrustable Professional Activities to Evaluate Insight in Learning Progress, A literature Review

Background: Entrustable Professional Activities [EPAs] is a proposed framework for the improvement of Competency-Based Medical Education [CBME]. This work performs a systematic literature review to compare various methods describing the quantification of EPA processes/algorithms. Methods: Scopus, PubMed, Medline, Embase, and Google Scholar electronic databases were searched for relevant studies [between the period 2005-2019] that included the following keywords: "entrustable professional activities ", "observable practice activities", "proficiency level", "measurement" OR, "progress", AND "unit of performance." Structured forms were used to extract all the relevant information from the selected publications mainly under areas of milestones, level of entrustments, proficiency levels, units of practice and EPA score measurements. Agreement between the reviewers was calculated using Cohen’s kappa. Results: Three articles were deemed eligible for inclusion out of 114 records screened. The result of Cohen’s kappa which measures agreement [among reviewers] varied between 0.11 for proficiency level and unit of practice and 1.00 for the milestones and measurement scores. Discussion/Conclusion: The study clearly indicates that the three methods in the reviewed publications are describing the milestones and measurement scores of EPAs efficiently. However, reviewers disagreed regarding proficiency levels and unit of practice of EPAs described. The results of this study are beneficial to educators and researchers engaged in EPAs quantification and parameter development. Alsahafi A, Newell M, Kropmans T MedEdPublish https://doi.org/10.15694/mep.2021.000100.1 Page | 2


Introduction
Competency-Based Medical Education [CBME] is a teaching framework focussing on learning outcomes explained in a competency framework, and the trainees will reach their learning goals by consuming the time needed to achieve this goal (Ten Cate and Scheele, 2007). Medical education is challenged by higher educational institutions in different parts of the world trying to implement CBME (Hawkins et al., 2015). The evolution of patients' and populations' expectations has sparked the need for improvement in educational outcome to reach better clinical care and an upgraded educational programme (Caverzagie et al., 2017). The ultimate goal of most undergraduate and postgraduate educational programs is to achieve better patient care in medicine and healthcare sciences (Pamela, 2018).
Competencies in CBME are an integrated combination of knowledge, skills and attitudes (Albanese et al., 2008). Although there are valid tests to measure knowledge and/or skills, assessing trainee's attitudes is a greater challenge (Carraccio and Burke, 2010). CBME is characterised by vague language and hard to measure outcomes, partly due to the language used to describe competencies (Ten Cate, 2013). Whereas core clinical skills in each of the medical specialties are concrete, can be described, and should be reproducible for any undergraduate or postgraduate trainee in an independent manner without supervision (Ten Cate and Scheele, 2007;Beeson et al., 2014).
Ten Cate (2005) introduced therefore the so called Entrustable Professional Activities [EPAs]. EPAs are independent observable professional activities, within a certain time frame, that reflect one or more of the required competencies (Ten Cate, 2005). EPAs are items of work which necessitates appropriate knowledge, skills, and attitude. The trainee's assessment requires several levels of supervision in different sittings regarding his/her clinical performance in the workplace to achieve set milestones. Miller (1990) published a four-level pyramid describing educational progress skills. Each proficiency level of this pyramid is described by actions, starting from just factual knowledge, represented by the verb "knows", followed by "shows" and "shows how", and all the way up to "does" which demonstrates working level competency. Each proficiency level (knowledge, skills and attitude) requires a specific method of assessment. Entrustable Professional Activities [EPAs] equates to the "shows" level and reflects milestones, proficiency levels that require certain units of practice (Cruess, Cruess and Steinert, 2016). To achieve the milestones e.g. proficiency levels for all required units of practice 5 levels of supervision were identified. The first level 'pre-practice' is a basic understanding of the knowledge of the milestone to be achieved. The second level 'close supervision' is a form of direct supervision where the supervisor is taking the trainee by the hand through the various steps involved to achieve that milestone. Each step could be seen as a 'unit of practice'; multiple units of practice are required to achieve that milestone. Once a milestone is achieved direct supervision can be replaced by intermittent supervision. Indirect supervision would be sufficient if the supervisor is entrusting the level of proficiency being shown due to the units of practice being performed adequately and safely. The last and fifth level, is the level in which the trainee may provide supervision and instruction to junior learners (Ten Cate, 2005). Entrustability will be the end result of a wide range of supervision during various units of practice being performed; with corresponding feedback between the trainee and supervisor to enable the trainee to perform his/her tasks professionally and independently in the future (Hauer et al., 2014). The question arises how to assess this particular learning process? Alsahafi A, Newell M, Kropmans T MedEdPublish https://doi.org/10.15694/mep.2021.000100.1 Page | 3 Assessment in general could be determined to be either qualitative or quantitative assessments. Where qualitative assessment is very much associated with 'quality' and is associated with gathering information that yields results that can't easily be measured by or translated into numbers. It is often used when you need the subtleties behind the numbers -the feelings, small actions, or pieces of community history that affect the current situation. Whereas quantitative assessment offers a myriad of data collection tools including structured interviews, questionnaires, and tests. The latter are very much determined by validity and reliability of the data collection e.g. measurement tools and rely on numerical values acquired from statistical tests (Tavakol and Sandars, 2014). Many undergraduate and postgraduate programs are still struggling on how to quantify progress. As the decision of entrustment is a complex entity, current publications use various ways to determine how to qualify and quantify EPA's. The feedback provided by the supervisor is in a way the qualitative form of assessment of EPA's. Quantitative analysis in terms of 'unit of practice', 'proficiency levels' and 'levels of required supervision' to quantify progress due course of the training is rarely implemented (Beeson et al., 2014;Loon et al., 2016).
Eric Warm in 2014, wondered about the proper tools to quantify EPAs entrustment level progression among residents. He found no gold standard and therefore invented a first and so far unique approach. In his publication he combined levels of entrustment over a period of time during program year 1, 2 and 3. Furthermore he contrasted each resident with the class average per EPA over a number of assessments (Warm et al., 2014). Beeson in 2014 described milestones, proficiency levels and a matrix of scoring. He mapped milestones sub-competency of pharyngitis and chest pain to proficiency level then combined them in a single table to evaluate the trainee progression (Beeson et al., 2014). Eric Warm in 2016, published another study about EPAs quantification. A graph of mean entrustment ratings was plotted for all the milestones over residency months (36 months), in the period from July 2012 through June 2015 for both attending assessor and peer assessor mapping progressive entrustment over time. Similarly, core competency was charted during the same period which showed an increase in all competencies over time between the residents (Warm et al., 2016).
The aim of this study is to perform a systematic literature review to compare various methods describing quantification processes/algorithms of EPAs; to improve quantitative assessment of professional activities; and to demonstrate visual progress.

Study design:
This literature review aims to provide a comprehensive understanding of current quantitative assessment of EPAs. To understand how EPAs can be quantified to measure progress, as measuring progress is a crucial element of assessment in undergraduate and postgraduate learning to be able to determine 'good' and 'bad' performance. Furthermore, to identify whether sufficient 'unit of practice', 'milestones' and 'level of supervision' measurement contributed to the achievement of entrustment of one or more EPAs [after (Rowe, 2014)].

Search strategy:
Scopus, PubMed, Medline, Embase, and Google Scholar electronic databases were used to search for relevant studies published between 2005-2019 respectively. The search strategy included the following text keywords: "entrustable professional activities", "observable practice activities", "proficiency level", "measurement" OR, "progress", AND "unit of performance." We engaged a librarian from the National University of Ireland, Galway (NUIG), and recognised data expert to guide our search strategy. To help reduce publication bias and to increase the comprehensibility of the reviewed literature, the librarian helped identify additional grey literature in physical as Alsahafi A, Newell M, Kropmans T MedEdPublish https://doi.org/10.15694/mep.2021.000100.1 Page | 4 well as electronic databases (Dart, Open Grey, Carrot2 and Embase). The reference section of included studies were also searched thoroughly for any additional publications. A bibliographical database was created using "Endnote version X8" which was used to store and manage the references.

Study selection:
Studies were eligible for inclusion if they were: Include the term entrustable professional activities.
Reported on measuring proficiency level. 4.
Described the performance units.

5.
Contained detailed graphs for the quantification of EPAs. 6.
Studies were excluded if the resources were published in languages other than English and if the topic discussed EPAs assessment in general. Duplicates publications were removed. Titles and abstracts of articles were screened based on the inclusion criteria above. Full text of the shortlisted publications were screened again for eligibility. In total 114 publications were assessed thoroughly of which only three publications matched the inclusion criteria. These papers were reviewed by three independent reviewers.

Data extraction:
Data extraction forms were used to extract all the relevant information from the included publications, including general information about the methods, year of publication, and specific information about the identification of milestones, describing proficiency level, stating units of practice and determining EPAs score measurements.

Statistical analysis:
Cohen's kappa was calculated as a measurement of agreement correcting for agreement by chance between the reviewers. Table 1 showing the interpretation of level of agreements between the reviewers using Cohen's kappa (Landis and Koch, 1977).

Results
A total of 114 records were screened, with 3 articles deemed eligible for inclusion (for PRISMA diagram, see Figure 1). All included articles were published during 2014 -2016. All the studies and reports were from the USA. Regarding agreement, Cohen's kappa varied between 0.11 for proficiency level and unit of practice and 1.00 for the milestones and measurement scores (Table 2).   Table 2 shows the agreement (plus sign) or disagreement (negative sign) of reviewers regarding different papers in addition to Cohen's kappa and confidence interval values. All the reviewers unanimously agreed to the fact that milestones and measurement scores were described adequately in all the papers under review. In addition, milestones and measurement scores are the areas which had the highest agreement with a Cohen's kappa of 1.00 and a Confidence Interval of [1.00, 1.00]. This implies that there is an almost perfect agreement in the case of milestones.
With regards to proficiency level being mentioned in the papers, the reviewers did not agree as much. In fact, all the reviewers agree that Beeson et al., (2014) covers the proficiency level adequately, whereas for both the papers by Warm et al., (2014; two reviewers have marked that the topic is not covered in the paper, whereas the third reviewer disagreed with their opinion. Indeed, proficiency level and unit of practice are the two areas which have the lowest agreement among the reviewers with a Cohen's kappa of 0.11 and a Confidence Interval of [-0.76, 0.98]. This can be classified as slight agreement. The unit of practice was deemed sufficiently covered by at least two reviewers in all the papers, however, one reviewer disagreed for Beeson et al., (2014) and Warm et al., (2014). In case of level of entrustment, the reviewers demonstrated a moderate agreement, with a Cohen's kappa of 0.56 and Confidence Interval of [-0.32, 1.00]. Regarding levels of entrustments, all the reviewers were in agreement with regards to the two papers by Warm et al., (2014;. In the work, by Beeson et al., (2014) only one reviewer agreed that Levels of entrustment is mentioned, while the other two disagreed.
We also looked at how each of the papers treat units of practice. Beeson et al., (2014) do not identify the units of practice required to reach the milestones they describe, nor did they mention or make observations about actual units of practice that were necessary for each of the milestones. Warm et al., (2014) on the other hand, identify units of practice for each milestone. In their case, they call it "Number of assessments". They were not able to come to a conclusion about the optimal number of assessments for each milestone, but they ended up getting an average recommended number of assessments based on their observations (Warm et al., 2014). Warm et al., (2016) follow a similar procedure to calculate and plot the mean entrustment rating using the average from a much higher number of assessments.

Discussion
Our literature review aimed to assess the current knowledge relating to the quantification of EPAs to evaluate learning progress in terms of competencies' measured in a reliable and reproducible way enabling quantitative assessment of professional activities for any undergraduate or postgraduate trainee. We sought to understand how could the EPAs be measured in Proficiency level, Level of Entrustment, Units of Practice and a final Measurement score and to gain insight into how to use and improve the measurement tool. We distilled evidence from three highly selected papers after a comprehensive search of several articles (published between 2005 -2019).
These parameters would require quantitative as well as qualitative approach for their assessment. Regarding quantification of EPAs in a standard way, this review has highlighted a lack of comprehensive studies to date.
Assessing "units of practice" was challenging because it is hard to estimate how many would be required to reach a certain milestone. Underestimating the adequate number of 'units of practice could result in poor performance during assessments. Too many units of practice would not be efficient for both the supervisor and the trainee (Martinez, Phillips and Harris, 2014). Further research needs to be conducted using both qualitative and quantitative data to arrive at the optimal number of units of practice required for each EPA category.
With regards to milestones, it is interesting to note that all of the papers describe milestones specifically for a certain branch or sub-field of medicine as opposed to generalized milestones that can be applied to any EPAs. Beeson et al.,(2014) has 23 milestones specially formulated for Emergency Medicine, that is derived from the Dreyfus and Dreyfus model of proficiency acquisition (Dreyfus, Dreyfus and Athonasiou, 1986;Varkey, Karlapudi and Bennet, 2008;Beeson et al., 2014). Each milestone is accompanied by the related sub-competency and also mapped to a certain proficiency level. Warm et al., (2016) has milestone assessments for Internal Medicine that and these are derived from Accreditation Council for Graduate Medical Education [ACGME] and American Board of Internal Medicine [ABIM]. They do not explain or tabulate how each milestone is mapped to a specific sub competency or proficiency level, as it was done by Warm et al., (2014). Beeson et al., (2014) describe five levels of proficiency that are very specific to emergency medicine whereas Warm et al., (2016) and Warm et al., (2014), use five levels of proficiency as well but they are broader and can be generalized to other fields of medicine as well, as they are mainly focussed on entrustment and supervision.
To compare the rigor of actual number of assessments that were recorded and studied, we looked at various metrics such as the time under study, the number of participants and the number of assessments that were recorded about them. In Warm et al., (2016) overall assessment was performed over a period of 36 months. In the other study by Warm et al.,(2014) entrustment was captured only for a period of 12 months. This limits the tracking of progression between the different years. There is also quite a huge difference in the number of assessments that were studied. In Warm et al.,(2014) number of assessments ranged from 4 to106. Whereas in Warm et al., (2016) the number of assessments ranged from 1125 at SBP-2 level, to 53,723 at the MK-1 level. Warm et al., (2014) used assessments of 120 residents from the University of Cincinnati Internal Medicine Residency (UCIMR), for a period of one year, whereas Warm et al., (2016) used 92 categorical and preliminary residents from the same UCIMR program, and reported the outcomes for a much longer period of 3 years. Beeson et al., (2014) describe various milestones and their proficiency levels for EM. However, it does not assess participants or residents because it does not conduct studies about the actual achievement of milestones in residents or participants.
It is quite evident from the studies under review, that there is no standardised approach to the quantification of EPAs. The methods that have been developed are very specific to their own domains. For example Beeson et al., (2014) focused mainly on emergency medicine, while Warm et al., (2016) emphasized on internal medicine. The use of standardized metrics in terms of proficiency levels can be a potential methods to improve the quantification process to assess progress in learning. The suggested template for the quantification is as follows. Determine the proficiency level that has been attained from achieving the milestones. 3.
Future research will be useful to determine the quality and validity of the quantification process. Broad definitions of milestones and similar units of practice will help standardize the assessment procedures and also help to reduce the complexity of the EPA quantification. Thus enabling a holistic view for the trainees and also make it easier for the assessors. The information collected in this review can help inform the creation of guidelines for quantification of EPAs in a standardized way. A systematic process of development will hold the key to the successful implementation of the proposed framework.

Conclusion
A systematic literature review was performed to compare methods describing the quantification of EPA processes/algorithms. Only three publications describing three quantification methods were included and assessed by independent reviewers. It was demonstrated that the selected methods effectively covered the milestones and measurement scores of EPAs with good agreement among all reviewers (Cohen's kappa of 1.00 for both). However, according to our reviewers, these methods did not efficiently cover the proficiency levels and unit of practice of quantification of EPAs. A Cohen's kappa value of 0.11 was attained for both of these parameters. These results indicate areas that need to be developed further for devising quantification methods for EPA processes. The study also proposes a standardized approach to the quantification of EPAs with a clear description of each step involved.

Take Home Messages
This paper compares methods to quantify EPAs progress.
EPAs are independent observable professional activities within a time frame and need to reflect progress over time.
Qualitative assessment is associated with gathering information on phenomena, while quantitative assessment deals with numbers or counts, and are determined by validity and reproducibility of the EPAs.