Attitudes towards Interprofessional education in the medical curriculum: a systematic review of the literature

There is agreement among educators and professional bodies that interprofessional education needs to be implemented at the pre-registration level. We performed a systematic review assessing interprofessional learning interventions, measuring attitudes towards interprofessional education and involving pre-registration medical students across all years of medical education. A systematic literature review was performed using PubMed, PsycINFO, EThOS, EMBASE, PEDro and SCOPUS. Search terms were composed of interprofession*, interprofessional education, inter professional, inter professionally, IPE, and medical student. Inclusion criteria were 1) the use of a validated scale for assessment of attitudes towards IPE, and results for more than 35 medical students; 2) peer-reviewed articles in English and German, including medical students; and 3) results for IPE interventions published after the 2011 Interprofessional Education Collaborative (IPEC) report. We identified and screened 3995 articles. After elimination of duplicates or non-relevant topics, 278 articles remained as potentially relevant for full text assessment. We used a data extraction form including study designs, training methods, participant data, assessment measures, results, and medical year of participants for each study. A planned comprehensive meta-analysis was not possible. This systematic review included 23 articles with a pre-test-post-test design. Interventions varied in their type and topic. Duration of interventions varied from 25 min to 6 months, and interprofessional groups ranged from 2 to 25 students. Nine studies (39%) reported data from first-year medical students, five (22%) from second-year students, six (26%) from third-year students, two (9%) from fourth-year students and one (4%) from sixth-year students. There were no studies including fifth-year students. The most frequently used assessment method was the Readiness for Interprofessional Learning Scale (RIPLS) (n = 6, 26%). About half of study outcomes showed a significant increase in positive attitudes towards interprofessional education after interventions across all medical years. This systematic review showed some evidence of a post-intervention change of attitudes towards IPE across different medical years studied. IPE was successfully introduced both in pre-clinical and clinical years of the medical curriculum. With respect to changes in attitudes to IPE, we could not demonstrate a difference between interventions delivered in early and later years of the curriculum. PROSPERO registration number: CRD42020160964.


Background
According to the World Health Organization (WHO), Interprofessional Education (IPE) occurs when "students from two or more professions learn about, from, and with each other to enable effective collaboration and improve health outcomes" [1]. Safe, high-quality, accessible, patient-centred care requires continuous development of interprofessional competencies [2], and IPE has repeatedly been called for, so that healthcare students can enter the workforce as effective collaborators [3][4][5].
A growing amount of empirical work shows that IPE can have a beneficial impact on learners' attitudes, knowledge, skills, and behaviours (the so-called collaborative competencies) [6,7], and can positively affect professional practice and patient outcomes [8,9]. IPE may enhance attitudes toward teamwork and collaboration, leading to improved patient care upon graduation. However, the optimal time to expose medical students to IPE is still subject to debate. IPE may enhance attitudes toward collaboration and teamwork during training, leading to improved attitudes towards IP upon graduation. Nevertheless, the complexity of simultaneous teaching for different healthcare disciplines, as well as logistical problems and busy timetables raise issues concerning the introduction of IPE interventions. The optimal timing to introduce IPE and whether immersion (i.e. continuous collaborative learning) or exposure (periodic collaborative activities) should be adopted [10] are still subject to debate. Gilbert [11] suggests exposure during the early years and immersion in the graduation year. Reasons for this include ensuring the optimal development of students' professional identity before expecting them to work collaboratively with others. Furthermore, delaying the introduction of IPE to later in the curriculum may be deterred by the students' focus on profession-specific clinical practice, and immersion in vocation-specific stereotypes or negative attitudes [10]. Current undergraduate literature shows a tendency to introduce IPE earlier, even in the first year of studies [11,12], but the most effective timing to perform PE interventions in the medical curriculum remains to be determined.
We undertook a systematic literature review to determine the most effective time to introduce IPE to preregistration medical students. Additionally, we were interested in exploring the nature of the training, the assessment methods and the study outcomes. Our systematic review was guided by the research question: "What is the optimal time to institute interprofessional education interventions in the medical school curriculum?"

Study design
We performed a systematic review of the literature focusing on interprofessional learning interventions in pre-registration medical students and applied a review protocol based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) statement [13]. We also aimed to perform a meta-analysis with studies grouped by type of assessment. This systematic review was registered in PROSPERO (www.crd. york.ac.uk) with the number CRD42020160964.

Data sources and selection criteria
The systematic literature search was performed on December 12, 2019, using the databases PubMed, Psy-cINFO, EThOS, EMBASE, PEDro and SCOPUS. The following keywords and subject headings were used as search terms: interprofession*, interprofessional education, inter professional, inter professionally, IPE, and medical student. We included all peer-reviewed articles in English and German that reported on evaluative studies of IPE interventions including medical students, and were published after the 2011 Interprofessional Education Collaborative (IPEC) report [2]. The full search strategy is available in an additional word file [see Additional file 1]. In addition, we included articles found in the reference lists of previous reviews on IPE, discovered as a result of the search for IPE interventions [4,6,9,[14][15][16][17][18][19][20][21][22].

Inclusion criteria
We included studies that reported on assessment of knowledge, skills or attitudes (KSA), with an IPE intervention, and that reported quantitative results with a validated IPE instrument. We included only studies using previously comprehensive validated instruments according to various psychometric tests. Validated questionnaires provide reliable and valid results, and can be used to benchmark or compare results on an international level [23], and make statistical comparisons, therefore increasing rigour and allowing for a meta-analysis. One limitation of the use of validated questionnaires is the lack of further piloting or cultural adaptation, which may induce bias. We also narrowed our search to groups of at least 35 medical students in the same year of their medical education programme, to ensure an adequate sample size for statistical validity. To avoid interventions in overlapping years of education, we selected studies reporting on interventions with a duration of at most 6 months (regardless of the type of intervention, the study programme, and the educational year of other students taking part). Although we encountered qualitative IPE studies, we chose a positivist approach because it better aligned with our intention to perform a meta-analysis.

Exclusion criteria
We excluded conference contributions and abstracts without a related peer-reviewed published article. We also excluded all non-validated questionnaires and articles without available full-text in English or German.

Identification of potentially eligible studies
After the primary search, all titles and abstracts were screened and duplicates or non-relevant articles were excluded. The full text of the remaining articles was read by two authors (JBE and AF) to identify the eligible articles for this review. All potentially eligible articles were imported into a software platform for systematic reviews (http://rayyan.qcri.org) [24] to expedite the screening of abstracts and titles and to determine the final selection of eligible studies. The two authors initially performed selection in a blinded mode with three options: "include", "exclude" and "maybe". After finishing the first personal assessment, results were unblinded and disagreements were resolved by discussion of individual papers to find consensus. The study selection process is outlined in the PRISMA Flow Diagram - Fig. 1.

Data extraction and synthesis
The data extraction form was developed by two reviewers, informed by the form from Reeves et al. [9] but modified to include important aspects specific to this review, including ratio of study year to total duration of studies and classification of "early" or "late" depending if the IPE intervention occurred in the first or second half of medical studies. The reviewers extracted additional data regarding the context of study, recruitment, description of participants, study design, results and conclusions. The analysis of the risk of bias was performed independently, at a later stage. RG moderated in case of disagreement. Upon completion of article extraction, data were analysed using the Statistical Package for the Social Sciences (SPSS). 23.0. (IBM Corp., Armonk NY, USA). We report descriptive statistics for quantitative data (median, IQR). Data extracted were synthesised in a narrative manner, using an integrative and aggregative approach [25].

Quality assessment and risk of bias
The quality of included studies was also evaluated by JBE and HC using a standardised critical appraisal tool, the McMaster Critical Review Form for Quantitative Studies [26]. If research articles met each criterion outlined in the appraisal guidelines, they received a score of "one" for that item, or, if they did not, a score of "zero". Item scores were then summed to provide a score of a maximum of 16, with 16 indicating excellent methodological rigour. The quality was defined as poor when the overall score was 8 or less, fair if 9-10, good if 11-12, very good if 13-14 and excellent if 15-16 [27]. This tool was chosen for this systematic review as it is published, freely available, has been used extensively, and can be applied to a range of research designs [28]. Differences in judgment were resolved through discussion.

Statistics
A meta-analysis for those studies using the Readiness of Interprofessional Learning Scale (RIPLS) [10,[29][30][31][32][33] was attempted with the R meta package [34], as this scale was most often used. Otherwise, descriptive analyses were conducted, including frequencies. Where applicable, scales were reversed by subtracting the mean from the maximum score for the scale to ensure a consistent direction of effects across studies. Weighted means of subscales were calculated for each study using the number of participants as weights. Pooling of estimates on the single-item level was not possible, as Sheu et al. [30] only reported on subscale level. Estimates of weighted means of subscales are reported with 95% confidence intervals (CIs). A random effects model was used with the inverse variance method for pooling of estimates across the remaining studies using RIPLS. Standard deviations of mean changes were not given and had to be calculated according to Cochrane's Handbook [35], which introduced further uncertainty by the need to choose a more or less random correlation coefficient for standard deviations.The meta-analysis was conducted using R 3.5.0 statistical package (R Foundation for Statistical Computing, Vienna, Austria) after related content was extracted and all remaining analyses were conducted by SPSS v.23 (IBM Corp. in Armonk, NY, USA).

Meta-analysis
Initially we planned to undertake a meta-analysis of all studies included in the review. However, with such a broad range of instruments and therefore covering various different factors, it was not feasible. Instead, we performed the analysis with the RIPLSas it was the most frequently used instrument -in the knowledge that this would only represent 26% of the articles in this review. Due to the heterogeneity in the reporting of RIPLS results, a sound estimation of summary scores across studies was hampered. Whereas Darlow et al. [33] and Hudson et al. [10] used altered instruments with more than 19 items, Chua et al. [29], Paige et al. [32], Sheu et al. [30] and Sytsma et al. [31] used the original 19item RIPLS. Nevertheless, in the article by Paige et al. [32], the item "For small group learning to work, students need to trust and respect each other." is missing and the author did not respond to an email inquiring further information. Combined with extensive heterogeneity in reporting as well as statistically tested (Cochrane's Q < 0.01 for the meta-analysis of Chua et al. [29], Paige et al. [32], Sheu et al. [30] and Sytsma et al. [31] for the subscales team, identity and role (see supplemental digital file Additional file 3/ , supplemen-tal_material_IPE_RIPLS_original_data.xls) the combination of the single study data for a summary measure seems prone to error. Additionally, authors used means and standard deviations in the original articles, which are not the appropriate summary measures for Likert scaled items. As Sheu et al. [30] only reported the means and standard deviations of RIPLS-subscales, a merging of information for meta-analysis was only possible on that level and not on a single item level. Furthermore, the standard deviations for the mean changes (difference of scores pretest-post-test) were not given and had to be estimated according to Cochrane's Handbook (16.1.3.2 Imputing standard deviations for changes from baseline), which introduced further uncertainty by the need to choose a rather random correlation coefficient of standard deviations (0.4 in our case). With regard to the pragmatic heterogeneity of interventions across studies, an ordinary pre-testpost-test score difference is a too simple way to capture the information created by the original studies. All in all, a meta-analysis could not be performed because of the high heterogeneity of the instruments used and the inconsistent data reporting.

Discussion
In this systematic review, we analysed IPE interventions based on 23 studies published between 2011 and 2019. Our findings show that medical students were exposed to IPE interventions at various points in their training, and we could establish evidence of effectiveness of IPE.    Three studies involved only medical students and therefore did not meet the WHO definition of IPE. However, they reported on interprofessional interventions and therefore were not excluded from this systematic review. All years except the fifth study year were represented, so no preference for pre-clinical or clinical years could be observed. However, studies in the first four years of medical education were more frequent. This may reflect variation in the length of preregistration medical education programmes worldwide. In the USA, medical school consists mainly of 4 years of training (generally preceded by a 3-4-year Bachelor's degree), while in Europe it averages 6 years (without a preceding program) [55].
In Europe, most medical university programmes are public, and rather larger cohorts of students are educated (e.g., Germany has 36 public and only two private medical schools, and almost 10,000 new medical students per educational year, leading to an average class size of over 260 students) [56], while in the USA (141 fully-accredited medical schools), more than one third are private (n = 56) and class size is much smaller, with an average of 146 students per educational year [56,57]. This may also explain the higher frequency of studies from the USA, as implementing IPE elements could be more feasible with smaller classes, and private medical schools may suffer more pressure to evaluate their programmes.
The optimal timing to introduce IPE is still subject to debate [10]. In clinical years it may seem reasonable, as it contributes to optimal development of students' professional identities and gives them experience in working collaboratively with students in different health professions [11]. However, the introduction of IPE so late in the medical curriculum may be complicated by the students' focus on profession-specific clinical practice [10]. On the other hand, introducing IPE early in pre-registration healthcare courses may be useful in breaking down negative attitudes and avoiding stereotypes [58][59][60].
From our analysis we could not determine the best time to introduce IPE, as both pre-clinical and clinical IPE interventions showed some degree of success. It appears that late IPE interventions show a trend to be longer and more statistically significant. It seems reasonable to conclude that interventions should be introduced in the early years and continue throughout the curriculum. More well-designed studies are needed to address this gap in knowledge. Published IPE interventions had a pre-test-post-test design and most studies were cross-sectional. Interventions varied in their type and topic, group sizes were small and most activities were only performed once. There was also a paucity of studies reporting medium and long-term outcomes. Most studies (78%) were of good or very good quality, although a small proportion still scored poorly. This is consistent with previous reviews [4,6,15,18]. This trend limits the development of strategies for targeting long-term behaviour changes and potential to positively impact patient outcomes. Longer interventions and longitudinal follow-up of learning outcomes are key to identifying robust outcomes that lead to changes in practice. An increasing number of studies now report mid-and long-term outcomes, butas we can see from our own samplethese are still a minority. More studies are needed in models for pre-licensure IPE interventions (including adequate evaluation of their effectiveness), particularly regarding long-term outcomes [9,31,61]. In situations where prolonged IPE training is not feasible due to organizational limitations, intermittent interventions may be a good strategy [47]. The heterogeneity of most outcome measures may also limit the ability to draw conclusions about best practices and has, in our case, prevented the accomplishment of a metaanalysis.
Studies were most frequently assessed with RIPLS. The Readiness for Interprofessional Learning Scale, developed in 1999, was among the first scales developed for measurement of attitudes towards interprofessional learning [62]. It has been translated and acculturated into several languages [63]. The scale is very popular, but it has not been updated, it fails to embody all the dimensions of the Core Competencies for Interprofessional Collaborative Practice [2], and its conceptual framework has recently been questioned [63]. Additionally, concerns about its low internal consistency at item level and subscale resultsraised by the RIPLS authors themselvesperpetuate the debate of what exactly the RIPLS is measuring [64] and there have even been past recommendations to abandon the scale altogether [23,65]. Finally, some newer scales, more aligned with the IPEC dimensions, have also been successfully tested and acculturated [66,67]. While educators, curriculum planners and policy makers continue to struggle to identify methods of interprofessional education that lead to better practice [9], clearer measures of interprofessional competency are needed to assess the outcomes from health professional degree programs and to determine what approaches to interprofessional education benefit patients and communities.
The results from this review and from individual studies should be interpreted with caution: students' educational backgrounds, as well as attitudes, expectations and stereotypes, may vary considerably between institutions and countries and may influence how the IPE interventions are experienced. This probably accounts for many differences in effectiveness of IPE activities in different settings [15]. Additionally, a few studies described a "package" of interprofessional activities, and medical curricula differ significantly, which may introduce more bias. University IPE programmes should agree on a comparable methodology that aligns with research in IPE (e.g., larger cohorts, multi-centre studies) and should focus on fewer instruments to measure IPE, adequately assessed for validity, responsiveness, reliability, and interpretability [45].
There is a broad variation in the length of the medical curriculum between continents and countries. Most of the studies didn't explain their specific curriculum to the reader. For many articles, we were not able to determine the total length of purported medical studies and therefore determine whether the IPE intervention took place in the final year, which would have been relevant to this literature review. To bridge this gap in knowledge we propose that future research should briefly describe their specific medical curriculum.
Our methodology also has limitations. We decided a priori to include only papers with a at least 35 medical students. The reason was to have sufficiently powered studies in the sample. However, this may have led to some selection bias, or left out potentially relevant interventions. Because we were interested in IPE effects on medical students, we also excluded all studies that did not report specific results for medical students. This limited the number of positive studies available. Similar to other systematic reviews, our work aimed to exclude all "lower quality" studies (i.e., non-randomised, non-experimental, qualitative studies) [9,16,20]. Reflecting on our methods, we question whether they are adequate for social or educational research, as there are repeated appeals for more qualitative reviews in IPE [61].
Unfortunately, there were also several issues that made a meta-analysis impossible. First, as RIPLS uses a Likert scale (therefore, an ordinal scale), central tendency statements should be calculated with the median value. However, most studies in this sample chose to report the mean. This is acceptable if one assumes equal distances between items, but it is very unrealistic. Additionally, students responding to pre-and post-intervention questionnaires were pooled cohorts, and items differed in wording (questionnaires were slightly modified). In given studies, some items were not reported. In other studies, items were sometimes scored reversely (negative attitudes), and some studies did not report the change in score which is the outcome of interest for the metaanalysis.