Self-directed learning assessment practices in undergraduate health professions education: a systematic review

ABSTRACT Purpose The goal of this systematic review was to examine self-directed learning (SDL) assessment practices in undergraduate health professions education. Methods Seven electronic databases were searched (PubMed, Embase, PsycINFO, ERIC, CINAHL, Scopus, and Web of Science) to retrieve English-language articles published between 2015 and July of 2022, investigating assessment of SDL learning outcomes. Extracted data included the sample size, field of study, study design, SDL activity type, SDL assessment method, number of SDL assessments used, study quality, number of SDL components present utilising the framework the authors developed, and SDL activity outcomes. We also assessed relationships between SDL assessment method and number of SDL components, study quality, field of study, and study outcomes. Results Of the 141 studies included, the majority of study participants were medical (51.8%) or nursing (34.8%) students. The most common SDL assessment method used was internally-developed perception surveys (49.6%). When evaluating outcomes for SDL activities, most studies reported a positive or mixed/neutral outcome (58.2% and 34.8%, respectively). There was a statistically significant relationship between both number and type of assessments used, and study quality, with knowledge assessments (median-IQR 11.5) being associated with higher study quality (p < 0.001). Less than half (48.9%) of the studies used more than one assessment method to evaluate the effectiveness of SDL activities. Having more than one assessment (mean 9.49) was associated with higher quality study (p < 0.001). Conclusions The results of our systematic review suggest that SDL assessment practices within undergraduate health professions education vary greatly, as different aspects of SDL were leveraged and implemented by diverse groups of learners to meet different learning needs and professional accreditation requirements. Evidence-based best practices for the assessment of SDL across undergraduate healthcare professions education should include the use of multiple assessments, with direct and indirect measures, to more accurately assess student performance.


Introduction
Health professionals within a wide variety of fields are charged with using evidence-based medicine to evaluate, diagnose, and treat patients. In addition, health professions fields are constantly evolving. Indeed, Densen wrote that the estimated doubling time of medical knowledge in 1950 was 50 years and thus was projected to be 73 days in 2020 [1]. Adding to the difficulties regarding the professionals' knowledge, is that it declines over time [2][3][4]. This loss of knowledge can have detrimental effects on the patient, therefore it is critical that education promotes skills associated with life-long learning. Because self-directed learning (SDL) is crucial for cultivating life-long learners [5], it is a key component of health professional education [6]. Self-directed learning allows health professionals to continue to grow and refine their knowledge in order to improve patient care and well-being [7].
The definition of SDL, coined by Knowles in 1975, is 'a process in which individuals take the initiative, with or without the help of others, in diagnosing their learning needs, formulating goals, identifying human and material resources for learning, choosing and implementing appropriate learning strategies, and evaluating learning outcomes.' [8] In SDL, the learning responsibility shifts to students, and they take on an active role in developing and initiating actions necessary to achieve their learning goals. Another concept known as 'self-regulated learning' shares some similarities with SDL and is sometimes used interchangeably with SDL. Both concepts require active engagement of the learner, choice and decisions regarding the learning strategies, and evaluation [9]. However, self-regulated learning commonly takes place in the classroom, originates from cognitive psychology, and focuses on the learning processes involved with a task [9,10]. Moreover, it relies on metacognitive and cognitive operations such as selfefficacy and self-awareness [9]. Interestingly, SDL requires self-regulated learning [9]. Studies have shown SDL is associated with increased confidence, self-efficacy, critical thinking, self-awareness, and autonomy [11]. Learners who practise SDL become more aware of their deficiencies and how to address them; health professionals who use SDL continue to grow and refine their knowledge in order to improve patient care and well-being [8,12].
Indeed, accreditation bodies of many health professions education programs require a form of self-directed or lifelong learning in the curriculum [8,[13][14][15][16][17][18][19][20][21][22]. For example, the Liaison Committee on Medical Education (LCME, 2021) and the Commission on Dental Accreditation (CODA, 2020) accreditation standards directly require SDL curricular components [20,23]. However, what can be defined as SDL varies across different health professions [8,24,25], and while some programs do not specify criteria or definitions for SDL, they still require that it be included in the curriculum. Moreover, the way in which SDL is assessed varies amongst and even within disciplines. As a result, a general consensus as to what activities constitute SDL and how SDL should be uniformly assessed amongst health professions is lacking [26]. SDL is frequently assessed in a variety of manners, including, but not limited to, changes in behaviour, knowledge-based assessments, and qualitative thematic analysis of student reflections [27][28][29][30]. All of these methods have the potential to be valid means to assess SDL. However, to our knowledge there is no comprehensive review of the assessment practices related to SDL, including the relationship with learning outcomes [26]. This has led to a call for research into SDL outcome measures and methodology in order to advance SDL in medical education [31].
This systematic review was initiated to further the understanding of the assessment of SDL in undergraduate health professions education. The primary goal of this systematic review was to answer the research question: 'how is SDL assessed in undergraduate health professions education?' The primary objectives of this review were to: (1) determine the type of assessment methods used to evaluate reported SDL activities. (2) report the outcomes of SDL activities on student learning based on the assessments identified as well as the components of SDL identified in a framework. (3) determine if there is a relationship between the identified SDL assessments and the number of SDL components, the study quality, the field of study, and the study outcomes.

Data sources and searches
This systematic review examines the types of assessment used to evaluate learning outcomes from SDL activities implemented in undergraduate health profession education programs. The review was prepared in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines [32]. Due to lack of a standard definition of SDL and a diversity of nomenclature related to assessment and health profession students, we worked with a health information specialist (MM) to develop search strategies that incorporated multiple keywords and index terms related to key concepts (e.g., self-directed learning, health profession students, assessment). We conducted comprehensive searches for literature published between January 2015 and July 2022 in seven online databases: PubMed, Embase, PsycINFO, ERIC, CINAHL, Scopus, and Web of Science. A preliminary literature search showed a more consistent upward trend in publications related to SDL around 2015. To ensure the most current literature related to SDL was incorporated, the search was limited to articles published in or after 2015. Unique index terms specific to each database were identified, if available, so search strategies tailored to each database were developed to optimise search retrieval (see Appendix 1 for a comprehensive search strategy with all search terms). Citation review (or hand searching by bibliography review) was performed by two investigators (SL, TT) to identify any additional studies. Literature search strategies were constructed based on PRISMA-S and the PRESS (Peer Review of Electronic Search Strategies) checklists [32][33][34].

Development of framework
While various health professions agree that SDL is of great importance, there is not an agreed upon standard of what defines SDL, how SDL should be taught, or how SDL should be assessed. We created an SDL framework drawing on a literature review of available accreditation standards or guidelines representing different health profession education programs related to SDL requirements ( Figure 1) [8,[13][14][15][16][17][18][19][20][21][22]. We used the framework to help us develop research objectives and to aid in identifying different SDL components and the number of steps in reported SDL interventions in included studies. This framework comprises seven steps, each step of which involves various components or activities ( Figure 1) [8,[13][14][15][16][17][18][19][20][21][22]. The number of steps used during the described SDL activity was compared to the assessment(s) used as well as the reported outcomes of the assessment(s). The criteria to define SDL were not developed in this review; studies were included if the article authors stated that the SDL was incorporated in a teaching or learning method.

Study selection
Search results retrieved from all databases were downloaded and imported to Covidence systematic review software (www.covidence.org), a web-based software program for managing and streamlining a systematic review (MM). Since the review primarily focused on determining the types of assessment used Figure 1. Expanded self-directed learning (SDL) framework. These steps may serve as a framework for identifying SDL activities and/or SDL interventions discussed in research articles and assessment methods employed for studying outcomes of SDL interventions [8,[13][14][15][16][17][18][19][20][21][22]. . to evaluate the core elements of SDL in health professions education and the outcomes of those SDL activities, studies were included if the authors reported use of SDL activities and assessed student learning outcomes related to those activities. Additionally, the students evaluated had to be enrolled in undergraduate health professions educational programs. The authors elected to include health profession programs that commonly include curricula that cover evidencebased patient care, in which students are involved in learning about the diagnosis, evaluation, and treatment of different disease states. For this reason, undergraduate students were from health professions educational programs including medicine, nursing, physical therapy, occupational therapy, pharmacy, dentistry, physician assistants, optometry, chiropractor, and podiatry. The studies were only included if they were published original research in English, with full content. Articles were excluded if they were in languages other than English, only investigated SDL assessment in graduate medical education or other non-health professions education, and were other types of publications rather than original research with empirical data.
Due to the wide variations in the definition and required components of SDL, all studies that reported the use of an SDL activity were included. Studies selected for the review contained at least one measure of assessing effectiveness associated with SDL, defined based on author declaration that SDL was utilised, and guided by the SDL framework created by the authors (see Figure 1) [8,[13][14][15][16][17][18][19][20][21][22]. Due to the differences in SDL definition, activities, and assessments used, the findings and outcomes of the SDL intervention described were solely based on the authors' reported data.
Two investigators screened titles and abstracts, and then full text articles for candidate studies, against the inclusion and exclusion criteria, in duplicate and independently (SL, TT). A third investigator served as a tie-breaker to resolve any discrepancies by consensus (MM). A tie-breaker vote was only required a handful of times.

Data extraction
We developed a standard data extraction form that was pilot-tested on five articles and extensively refined for the extraction process. Two investigators (SL, TT) worked independently to extract data from included studies with any disagreements resolved through consensus. A third investigator (MM) was required to resolve disagreements a handful of times. Extracted data included the country of a study conducted, field of study, sample size, study design, description of SDL, SDL activities, assessment methods used to evaluate effectiveness of SDL (student perceptions including SDL readiness, knowledge examinations, etc.), outcomes or findings of SDL activities, and number of SDL components. Psychometric information of reliability and validity on any SDL assessment used in selected studies was also extracted if reported in selected studies. Data on outcomes or findings of SDL activities were coded based on the outcomes reported by the article authors. For example, if article authors measured the effect of an SDL activity using an internal knowledge exam instrument, we reported outcomes as positive if there was a knowledge gain due to SDL in the article, and negative if there was a reported knowledge decrease. If an article reported that student reaction or perceptions of an SDL activity were low, we reported a negative outcome for SDL. The designation of a 'mixed/neutral' outcome was used if one assessment measure reported positive SDL outcomes while a second measure reported negative outcomes, or if the outcomes were not positive or negative. Studies that measure students' reactions and perceptions, including readiness scales, internally and externally developed surveys, are categorised as 'perception'; studies that use internal or external/ standardised assessments to measure knowledge are categorised as 'knowledge'; and studies that utilised other approaches (for example, a qualitative approach or peer review) are considered as 'other. ' Two investigators (KK, SL) examined and evaluated SDL activities described in selected studies against the SDL framework ( Figure 1) and extracted data on SDL components in these studies. A third reviewer (TT) reviewed the extracted data and resolved any disagreement by consensus a handful of times. Given that different components of SDL steps are described in selected studies, we extracted information on the number of steps included in reported SDL interventions.

Quality assessment
The authors used the Medical Education Research Study Quality Instrument (MERSQI) to assess the methodological quality of included studies [35]. This assessment tool has been widely used to assess the quality of studies and the resulting score has been shown to be predictive of the likelihood for publication [35][36][37][38][39]. The MERSQI is a 10-item tool that evaluates quality in the following six domains: study design, sampling, type of data, data analysis, validity of evaluation instrument, and outcome measures. Each item can have a maximum score of 3, and scores range from 5 to 18, with higher scores signifying higher quality. We used this instrument to assess validity (content, relationship to other variables, and internal structure), sample response rate, number of institutions, complexity of the data analysis, appropriateness of statistical analysis, the study outcomes assessed (perceptions, knowledge, behaviour, and patient outcomes), and study design (single group, non-randomized, randomised, etc.). If an assessment measure was validated by means of experts, guidelines, theory, or existing validated surveys, the measure was considered to have content validity. A measure was considered to have evidence of reliability if factor analysis or a measure of reliability (internal consistency, interrater, test-retest, etc.) was reported. The dual process of two reviewers (KK, TT) involving the third investigator resolving any disagreement a handful of times (SL) was used to ensure consistency and rigour in the quality assessment.

Data analysis
Descriptive statistics mean, median, mode, minimum, maximum, Kurtosis, and skewness were calculated for all quantitative variables (sample size, number of assessments, number of SDL components, and MERSQI scores) using https://www.socscistatistics. com/. Publication dates were examined for trends, if any.
The number of assessments used in each study was analysed to determine if there was evidence of differences in study outcomes or characteristics based on whether each study had 1, 2 or 3 assessments of SDL. In order to determine whether studies had different distributions of SDL components based on the number of assessments, a Kruskal-Wallis was used (to account for the ordinal nature of the number of SDL components). To test whether mean MERSQI scores differed between number of assessments, a One-Way ANOVA-independent measure was employed. Lastly, Chi-Square analyses were performed separately to test for an association between number of assessments and field of study (nursing/ medical fields) and for an association between the number of assessments and the study outcomes (positive, neutral/mixed, and negative), respectively.
As a second variable of interest, the type of assessment (perception, knowledge, or other) was investigated. This aspect of the analysis only contained studies which had one specific assessment type, not two or more. In a similar approach to above, a Kruskal-Wallis was used to determine whether the distribution of SDL components differed between the study assessment types (perception, knowledge, or other). Again, a One-Way ANOVA-independent measure was used to test whether mean MERSQI scores differed between the assessment type. Lastly, a Chi-Square test was run in order to test for an association between type of assessment and field of study.
All analyses were run using https://www.socscista tistics.com/. Kolmogorov-Smirnov test for normality was run prior to the analysis. Two-sided tests were performed and an alpha of 0.05 was used. Pairwise comparisons were performed when applicable, and necessary steps were taken to account for multiple testing (Bonferroni/Tukey's HSD). Statistical analysis was performed by KK and reviewed by a biostatistician. All statistical tests performed are summarised in Table 1.

Results
Of the 1,809 papers that met the initial inclusion criteria, 141 were included for review ( Figure 2). Table 2 summarises the extracted variables: SDL assessment methods, a brief description of each SDL activity, the number of SDL components, learning outcomes, and the MERSQI score for all included papers. For a full listing of participant fields of study, sample sizes, country, and study design types, see Appendix 2. Of note, only five (3.5%) studies contained all seven SDL components (see Table 2). Most authors did not mention the validity of the described instruments (see MERSQI, Table 3), but of those that did, 47 (33.3%) were validated and five (3.5%) were not. Similar results were seen in terms of instrument Research study quality and Assessment type One Way ANOVA-independent measures and Tukey's HSD reliability with 64 studies (45.4%) reporting use of reliable instruments, seven (5.0%) reporting that reliability was not determined, and 70 (49.6%) failing to report reliability. No standardised assessment method was used. The most common type of assessment of SDL was a student perception survey developed by the researcher(s) (31.4% Perceptioninternal; see Table 2). The second most common assessment method used was an internallydeveloped knowledge exam (23.3%). Most studies (58.2%) reported positive SDL learning outcomes, with 49 studies reporting mixed or neutral SDL outcomes (34.8%), and only 10 studies (7.1%) reported negative outcomes in terms of SDL activities (Table 2).

Basic characteristics of the studies analysed
The studies that met the inclusion criteria were conducted across the spectrum of health professions education (Appendix 2). The field of study of the majority of study participants was medical (N = 73, 51.8%) and nursing (N = 49, 34.8%). The number of studies that focused on pharmacy (N = 11, 7.8%), dental (N = 4, 2.8%), physical/physio therapy (N = 3, 2.1%) and optometry students (N = 1, 0.7%) was low. When we examined the publication dates of the studies that met the inclusion criteria, we found that SDL assessment studies did not appear to increase in popularity between 2015 and 2020. However, between 2020 and 2021, the number of studies rapidly spiked (presumably due to the COVID-19 pandemic and the resulting online learning) and remained high for the first seven months of 2022. The number of studies published during the years 2015 to 2020 remained stable at 11-13 per year, but then rose to 43 studies in 2021 and 22 studies from January until July of 2022. Included studies were from a variety of countries from around the world (Appendix 2). Out of 141 included studies, 28 studies were from the United States with South Korea being the second-most prominent with 27 studies. Studies from India were the third-most prominent with 21 studies. The two most common study designs were short case study designs and non-equivalent control group designs, with 40 and 25 studies respectively. Descriptive data for all quantitative data collected (sample size, number of assessments, number of SDL components, and MERSQI scores) are summarised in Table 3.

Number of assessments
There were 69 studies with one assessment, 59 with two assessments and 13 with three assessments. Of the studies with one assessment, there was a median number of SDL components of 4 with an interquartile range (IQR) of 5 to 3. Of the studies with two assessments, there was a median (IQR) of 4 (5 to 3). Studies with three assessments had a median (IQR) of 3 (4 to 3). There was no evidence of a significant difference between the distribution of the numbers of components based on the number of assessments [H There was evidence of a significant difference with regards to study quality (MERSQI), based on the number of assessments [F(2,138) = 13.71, p < 0.001]. The mean scores for studies with one, two, and three assessments were 9.49 (2.07),11.25 (1.86), and 11.27 (2.22) respectively. Post hoc testing using Tukey's HSD showed differences between studies with one and two assessment groups, and similarly between studies with one and three assessment groups (p = 0.003 and 0.004, respectively). Testing between studies with two and three assessment groups did not show evidence of a significant difference (p = 1).

Type of assessment
There were 42 (60.9%) studies with the assessment type classified as 'perception,' 13 (18.8%) with the type classified as 'knowledge,' and 14 (20.3%) classified as 'other' that utilised one assessment type. When looking at SDL components, there was a statistically significant difference in distribution found between study types [H(2) = 10.74; p = 0.005]. Studies classified as perception had a median (IQR) of 3.5 (4 to 2), whereas studies classified as knowledge had a median (IQR) of 4 (4 to 3). Those classified as other had a median (IQR) of 5 (5 to 4.25). Pairwise comparisons using a Bonferroni correction determined that there is evidence of a difference between perception and other (U = 122; p = 0.001), but not perception and knowledge (U = 225; p = 0.347) or knowledge and other (U = 52.5; p = 0.066).
There was a statistically significant difference of mean MERSQI quality scores between study types (p < 0.001). Studies classified as perception had a median (IQR) of 9.5 (10.5 to 7.6), whereas studies classified as knowledge had a median (IQR) of 11.5 (12.5 to 10.5) and other had a median (IQR) of 8 (10 to 7). Post hoc testing using Tukey's HSD showed evidence of statistically significant differences between knowledge and perception, as well as knowledge and other (p = 0.004 and<0.001, respectively). However, there was no statistically significant difference between perception and other (p = 0.467) Moreover, there was no statistically significant evidence of an association between type of assessment and field of study (medicine, nursing; p = 0.209).

Discussion
This systematic review was initiated to begin to address the necessity of a uniform understanding about assessment practices related to SDL in health professions education. Included studies revealed that different aspects of SDL were implemented to meet different learning needs. While the authors found there were wide variations in the reported SDL activities, the assessment methods used to evaluate the outcomes of these activities were more consistent, and were primarily related to student perception surveys or knowledge exams (see Table 2). We also found a statistically significant correlation between the number of assessments used and the study quality using the MERSQI methodology [35], with the use of more than one assessment being significantly associated with a higher study quality. However, we did not see a significant difference in study quality scores between those that used two versus three assessments, likely due to the smaller number of studies with three assessments. Additionally, while there was not a statistically significant relationship between the number of assessments used and study outcomes, or the number of SDL components identified in the study, the type of assessment used to measure the impact of SDL activities had a statistically significant relationship with the study quality. The highest quality studies were statistically associated with the use of knowledge assessments (mean 11.3), while lower quality scores were associated with the use of student perception assessments (mean 9.25) and those assessments that we categorised as 'other' (mean 8.5). Student perception assessments in our categorization included SDL readiness, student surveys developed by the researchers (perception-internal), as well as surveys that were previously published (perceptionexternal). When looking further at the types of assessments that were categorized as 'other,' many involved student opinions or perceptions. Examples included focus groups, interviews, reflective essays, peer or self-evaluation, skill confidence, and course evaluations ( Table 2). The relationships between these variables were then examined further. This statistically significant relationship between the number of assessments used and study quality is an important finding, as a blend of assessment measures, including both direct and indirect assessments, is recommended to more accurately evaluate student achievement of learning outcomes [177]. Direct measures, such as objective exams, presentations, or papers, provide direct evidence of student learning in regards to knowledge and skills [177,178]. According to Suskie [177], these measures offer tangible, self-explanatory, and compelling evidence of what students have or have not learned. Indirect measures, such as reflections, course evaluations, student surveys or focus groups, provide information about students' perceptions of their abilities [177,178]. While indirect measures may seem less credible since they do not provide a direct measurement of student performance, these assessments can provide important information, beyond the achievement of learning outcomes [177]. For example, focus groups or reflections can provide knowledge about the learning process and experience itself, potentially providing actionable information for improvement, beyond what knowledge-based outcomes may offer [179].
Less than half of the studies in this review used more than one assessment method to evaluate the effectiveness of SDL activities. It may be important to highly recommend the use of multiple modes of assessments, including direct and indirect measures, when developing an effective assessment approach to evaluate SDL outcomes. The use of multiple assessments is often considered essential for an accurate evaluation of health professions student learning due to various strengths and shortcomings for each type of assessment. Using multiple assessment methods at different time points can help provide a more accurate picture of student performance [180].
Many of the studies included in the review used student perception surveys either alone or in combination with other assessment methods to assess SDL effectiveness (109 studies, 77.3%; Table 2). According to Gabbard [181], this type of research often focuses on self-perceived improvements in knowledge or confidence in a certain area. There may be numerous reasons why self-perceptions are used in educational research, such as the relative ease with which the data can be collected and the importance of student satisfaction during their educational experiences [181]. But unfortunately, self-assessment is not always accurate as students may not have a true awareness of their learning capabilities and understanding of the material [182,183]. Moreover, a recent study published by Kemp and colleagues [24] found that while medical students claimed to have used and are familiar with SDL, they had not utilised aspects of SDL as defined by that study. This led the authors to speculate that students may not be able to identify components of SDL and explicit education regarding SDL may be required [29]. Other studies in the analysis commented on the potential limitations of student perceptions. For example, Kastenmeier and colleagues [122] noted that a major limitation of their study was that many conclusions were based on survey data which only measured student perceptions and not their actual acquisition of medical knowledge and SDL skills. Kershaw and colleagues [102] commented that although they evaluated student perceptions, it would be helpful to also use formative feedback from the teaching faculty to improve the learning process, assessment design, and criteria used for evaluations. According to Ehrlinger and colleagues [183], students' perceptions of their abilities and characteristics can be unreliable; there is no clearly established relationship between a student's self-confidence and their actual knowledge or skills. Dunning and Kruger [184] noted that of those with less understanding of the material, students are actually more likely to misjudge and overestimate their own abilities [181]. Oftentimes, stronger students have more accurate perceptions of their learning, whereas struggling students can be falsely overconfident in their abilities [183][184][185]. For these reasons, solely using self-perception data to assess SDL effectiveness might not provide accurate information about a student's true ability to use SDL and the effectiveness of the SDL instructional strategy.
Several of the studies used knowledge-based exams that were created internally (developed by the researchers) to assess the effectiveness of SDL (53 studies, 37.6%; Table 2). Many were administered shortly after the activity and did not test for longterm retention of knowledge. This is not a surprising finding. Many accrediting bodies identify required components of instruction and assessment throughout a curriculum, in which institutions develop their own assessment methods and standards to meet these requirements. While it is common for academic institutions to develop their own assessment material, this often makes it more challenging to compare students across different programs [180].
One of the most common SDL readiness assessments used when evaluating readiness was Guglielmino's SDL readiness scale validated by Dr. Lucy M. Guglielmino in 1977 [186]. This instrument measures students' self-reported attitudes, abilities, and characteristics that involve their readiness to engage in self-directed learning and is a type of student perception assessment method [101,186]. Despite its popularity, there have been many potential problems associated with the use of this instrument, including concerns about cost and validity [187][188][189][190]. There were also concerns about the reliability of the instrument when utilised with more diverse populations [189,191,192]. Considering that Guglielmino's scale [186] is over 40 years old and may have issues related to construct validity and reliability when evaluating diverse populations, it may be time to develop a new instrument. Additionally, using older versions of scales may mean that the researchers are not assessing elements related to recent innovations in education or healthcare, which may weaken the results and make them less accurate and/or reproducible. For example, with the advances in technology and changes in health professions education including required competencies, a more updated evaluation tool may be more effective to measure SDL readiness and effectiveness.
Overall, the components of SDL assessments are often different, as various health professions programs have diverse contexts of practice. While core competencies like professionalism may be similar among health professions, the expectations for skills in knowledge, application, and critical thinking are unique based on each program. It is likely that the instruments used are different because of the unique requirements of each profession.

Strengths and limitations
There were several strengths associated with this systematic review. The comprehensive literature searches were done by an experienced librarian searcher with multiple sources of data. In addition, an SDL framework ( Figure 1) [8,[13][14][15][16][17][18][19][20][21][22] was developed to help screen studies investigating SDL activities and outcomes measures. Finally, this research reviewed SDL in multiple health professions programs to allow for a more comprehensive review of the literature related to assessments of SDL.
In terms of limitations, this systematic review was limited to English articles with full content, published since 2015, therefore potential publication selection bias may have been introduced to the review. Articles published in other languages or grey literature of research studies (unpublished) may have different definitions of SDL and have used different assessment tools to evaluate learning outcomes from SDL activities or interventions. Additionally, it is important to keep in mind that these results may not be generalizable to health-care professionals, since compared to students, professionals most likely have an increased readiness for SDL; they are more aware of their learning needs and have increased skills in decision making and critical thinking [193].
To our knowledge, this is the first systematic review that focuses on assessment of learning outcomes from SDL interventions implemented in health professions education programs. The review shows that different components in the SDL steps were adopted in different health professions programs to meet different learning needs and professional accreditation requirements. We anticipate that an updated systematic review would be conducted in 3-5 years as an increasing number of research studies with strong research design are being published and the definition of SDL incorporates agreed-upon, salient components to achieve the optimum learning outcomes through implementing SDL activities in health profession education. The review also uncovers a large number of studies with single-group design that limits the generalizability of results and conclusions in these studies. While the review identifies a wide range of assessment tools to measure learning outcomes targeted on different SDL components of the SDL framework, it indicates the need for further research with a control group design and random assignment, using validated assessments administered to large sample sizes, and in an interprofessional education context. The study quality mean score placed the studies in a range similarly found for previously published medical education studies as indicated by studies that previously used this measure [38,194,195].

Conclusions and recommendations
Since effective SDL is important for lifelong learning and considered a necessary skill by many accrediting bodies for health professions education programs, it must be assessed appropriately. To the authors' knowledge, this systematic review is the first to thoroughly describe the types of assessment methods used to determine the effectiveness of SDL activities in health professions education programs. While the majority of included studies evaluated SDL using student perception surveys and internally-created knowledge-based exams, this review found that less than half of the studies used more than one assessment method to evaluate the effectiveness of SDL activities. Since SDL is considered such an important factor in healthcare professions education, we recommend evidence-based best practices for the assessment of SDL. It is important to recommend the use of multiple assessments, with a blend of direct and indirect measures, to ensure the most accurate evaluation of the learning process and student performance [177,178]. This is especially important as many board certifying exams measure a student's knowledge and clinical skills, but do not evaluate SDL skills to ensure the student is prepared for the lifelong learning required for effective patient care throughout a career. Future research should determine the most accurate method to assess SDL activities. This may also include the development of new instruments that primarily focus on the SDL process, including SDL readiness.

Acknowledgments
The authors would like to acknowledge Jacob Keeley for his help with the development and application of statistical methods. We are also grateful to Audrey Bell, CMI for her professional assistance with illustrations.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
The author(s) reported there is no funding associated with the work featured in this article.