Implementation of a multi-level evaluation strategy: a case study on a program for international medical graduates

Evaluation of educational interventions is often focused on immediate and/or short-term metrics associated with knowledge and/or skills acquisition. We developed an educational intervention to support international medical graduates working in rural Victoria. We wanted an evaluation strategy that included participants' reactions and considered transfer of learning to the workplace and retention of learning. However, with participants in distributed locations and limited program resources, this was likely to prove challenging. Elsewhere, we have reported the outcomes of this evaluation. In this educational development report, we describe our evaluation strategy as a case study, its underpinning theoretical framework, the strategy, and its benefits and challenges. The strategy sought to address issues of program structure, process, and outcomes. We used a modified version of Kirkpatrick's model as a framework to map our evaluation of participants' experiences, acquisition of knowledge and skills, and their application in the workplace. The predominant benefit was that most of the evaluation instruments allowed for personalization of the program. The baseline instruments provided a broad view of participants' expectations, needs, and current perspective on their role. Immediate evaluation instruments allowed ongoing tailoring of the program to meet learning needs. Intermediate evaluations facilitated insight on the transfer of learning. The principal challenge related to the resource intensive nature of the evaluation strategy. A dedicated program administrator was required to manage data collection. Although resource-intensive, we recommend baseline, immediate, and intermediate data collection points, with multi-source feedback being especially illuminating. We believe our experiences may be valuable to faculty involved in program evaluations.


INTRODUCTION
Evaluation is an essential step in curriculum or program development. However, evaluation is often not given promi nence during program development, as resources are directed towards implementation. There are benefits associated with evaluation strategies that evolve contemporaneously with pro gram development. These include a clear focus on measurable program outcomes, and an educational design that may pro mote learning (e.g., deep levels of participant reflection) and can be scheduled as part of the program.
The program evaluation literature has extensively document ed many approaches [18]. Program evaluation is essential for quality assurance. We adopted a 'traditional' approach to pro gram evaluation that measures structure, process, and out http://jeehp.org J Educ Eval Health Prof 2011, 8: 13 • http://dx.doi.org/10.3352/jeehp.2011. 8.13 comes. Examples of 'structural' elements include the content of the program, the number and timing of sessions, physical infrastructure, demographics, and expertise of the faculty. 'Pro cess' elements refer to the usefulness or value of the educa tional methods and provide insight into faculty and partici pant reactions to specific sessions and the overall program. 'Outcome' elements refer to changes in participants as a con sequence of participating in the program.
In this case study, we describe the development and imple mentation of the evaluation strategy for a program designed to support the international medical graduates (IMGs) work ing in rural Victoria, Australia. There are shortages of doctors working in rural practice and IMGs make a substantial con tribution to healthcare services. Rural locations are often the first appointment for IMGs in Australia [911]. Orientation to the healthcare system is critical but often overlooked. We de veloped a program -Gippsland Inspiring Professional Stan dards for International Experts (GIPSIE) to support IMGs working in rural Victoria. Elsewhere, we describe the GIPSIE program and the results of the evaluation [12]. We have sum marized key elements of the program in Appendix 1. The GIP SIE program comprised a weekend workshop and four subse quent evening sessions over three months. Simulationbased training was a prominent theme and addressed clinical knowl edge, attitudes, and skills, and included a range of activities (e.g., procedural skills training with a parttask trainer, man agement of the acutely ill patient with manikins, patient as sessment skills with simulated patients, etc.). Diverse clinical com munication skills were explored (e.g., teamwork, hando ver, telephone, critical information, etc.). Audiovisual review of performance was enabled through the use of video play back in small groups and later for individual IMGs on iPod Nano devices. GIPSIE was underpinned by a website offering diverse learning resources. Content experts were invited to lead sessions that integrated knowledge and skills reflecting local practice.
GIPSIE had three lead academic faculty (AW, MR, DN) supported by several other academics (including CH, CS), cli nicians, and an administrator. Seventeen participants entered the GIPSIE program, which was implemented in 2008 and 2009. Fifteen participants completed GIPSIE and rated the program highly, especially the simulationbased activities with feedback and later audiovisual review on iPods and the GIP SIE website. However, suggestions were made for improving several aspects of the program. Participants reported increased knowledge, skills, and professionalism after the program. Al though overall multisource feedback (MSF) scores showed no statistically significant changes, there were positive direc tional changes for three items: technical, teaching, and com munication skills. These developments were also supported by qualitative comments. Learning was reported to be sustain ed three months after the program.
In this case study on educational development, we describe the development and implementation process of the evalua tion, along with the benefits, and challenges of educational development with the goal of sharing our experiences of the process rather than the outcome of this approach to evalua tion. Kirkpatrick [13] developed a 4level model for evaluating vocational/training programs. The different levels explore tra inees' reactions, learning, behavioural changes, and any result ing change in organizational practice. Kirkpatrick's original model implied that all levels are recommended for full and meaningful evaluation of learning. Barr et al. [14] has adapted the original model. The adaptation reveals a 6level model part ly contextualized to healthcare. Appendix 2 illustrates the lev els of evaluation, what is measured, examples of evaluation methods, and relevance and practicality [14]. The evaluation methods increase in complexity by level.

MEASURES OF PROGRAM IMPACT
We had several goals in the evaluation of GIPSIE. Using the adapted version of Kirkpatrick's model of training impact (Ap pendix 2), we wanted to access as many levels as possible with in our resources. We also wanted to address retention of learn ing, which is often omitted from training evaluations [15]. That is, we wanted to design an evaluation strategy that would elicit development in trainees' knowledge, attitudes, and skills and detect sustained changes in clinical practice. Here we outline the evaluation strategy and its challenges.

EVALUATION INSTRUMENTS, DATA COLLECTION, AND ANALYSIS
There were eight instruments in the evaluation strategy and these are listed in Appendix 3. We have divided the time frame for data collection into three stages -baseline data collected prior to participants starting the program, immediate response to the program (including participant reactions) collected dur ing the program, and finally, intermediateterm response to the program when data was collected at least at least three mon ths after the program. All GIPSIE participants were invited to participate in each evaluation activity.

Baseline data
Baseline data was collected in order to do the following: gain insight to our diverse participants, use this data to ensure a tailored educational program, and have a basis on which to compare outcome data.

Instrument 1: Demographics and experience of living and working in Gippsland (Pre-program)
Participants completed a paperbased survey recording age, sex, experience of living and working in Gippsland, career goals, and experience with a range of educational methods. Responses included ratings of satisfaction and free text respon ses. The survey content was derived from our reading of the literature and issues we considered relevant to our region.

Instrument 2: Baseline learning needs analysis (LNA) (Pre-program)
Prior to commencing the program, participants were sent a paperbased form and asked to identify their expectations and learning goals for the GIPSIE program. Responses were in a free text format. The individual and collated content of the LNA were used to adjust the program content and personalize learning. The participants reviewed their LNA during and on completion of the program.

Instrument 3: MSF (Pre-program)
The main outcome measure consisted of MSF (pre and postprogram). This is also known as peer assessment or 360 degree feedback. We used a validated instrument designed for workplacebased assessments that is easily integrated with clin ical practice [16]. Each IMG was asked to nominate up to twel ve colleagues to make judgments on sixteen facets of clinical practice. A sixpoint scale was provided to reflect level of com petence. We also asked participants to selfassess using this form, so that they could build a picture of how they see them selves compared with others.
The process for collecting MSF data is presented in Appen dix 4. MSF assessments were completed before and then three mon ths after the program. Assessor identifiers were removed from the collated results provided to the participants.

IMMEDIATE RESPONSE TO THE PROGRAM
This data was collected to capture participants' experiences of GIPSIE including their perception of changes in knowledge and skills and the usefulness of the educational methods.

Instrument 4: Workshop evaluation (Weekend workshop)
After the weekend workshop, the participants were given a paperbased form and asked to rate the degree to which they met each prescriptive learning objective (1= "not at all met" to 6 = "completely met") and the educational methods (1 = "not at all helpful" to 6 = "completely helpful"). Participants were also asked to identify what worked well and what needed to be improved.

Instrument 5: End of session evaluations (4 x evening sessions)
Immediately after each evening session, participants were given a paperbased form and asked to rate the degree to which they met prescriptive learning objectives and the educational methods using the same scale described above. Participants identified what worked well and what needed to be improved. We also asked participants to record up to five things they learn ed in each session providing us with insight into what they valu ed and what might have been new to them.

INTERMEDIATE-TERM RESPONSE TO THE PROGRAM (THREE MONTHS AFTER THE PROGRAM)
This data was compared with baseline data to measure the true impact of the GIPSIE program.

Instrument 6: Telephone interview (Post-program)
A topic guide was used to explore participants' experiences of the program and the impact of those experiences on their work. The topic guide content was developed by program fac ulty to reflect GIPSIE goals and participant perceptions of the program content and educational methods. Detailed notes were made during the telephone interviews scheduled at a time to meet the needs of participants. These notes were read back to each participant as a process of validation. Some ver batim statements were recorded.

Instrument 7: GIPSIE website evaluation (Post-program)
User access information was recorded and collated. Partici pation in online quizzes and other webbased learning activi ties (e.g., bulletin board) were monitored through frequency of login, time online, and number of contributions.

Instrument 8: MSF or Peer Assessment Tool (PAT) (Post-program)
See instrument 3. Data from individuals was presented in a collated form so that they could monitor their progress from program commencement to completion. We used overall sum mary data to measure the impact of the GIPSIE program on participants' performance.
The alignment of the evaluation strategy with Kirkpatrick is illustrated in Appendix 2. Baseline data was essential to identi fy gains postprogram so instruments 1, 2, and 3 do not ap pear in the table, although they are critical to the process. In struments 4, 5, 6, and 7 explored participant reactions at dif ferent points in time (Level 1). Instruments 6 (selfreport) and 8 (MSF) provide insight to gains that were assessed after the program was finished, in knowledge (Level 2) and application of learning during and after the program (Level 3). The im pact of participants on the clinical environment (Level 4) was http://jeehp.org intended to be captured by Instrument 8 (MSF). It was not possible for us to address benefits to patients/clients (Level 5) within our resources.

DATA ANALYSIS
Quantitative data was entered into SPSS ver 18.0 (SPSS Inc., Chicago, IL, USA) for analysis. Descriptive statistics were used to summarize the data. The relatively small sample meant that we used nonparametric statistics. Individual differences pre and postprogram were identified using the Wilcoxon signed rank test. Statistical significance was established at p< 0.05.
Qualitative data (free text comments and telephone interview data) were thematically analysed. Themes were identified in dependently and then agreement negotiated by the research ers (DN, AW, CH). An external evaluator (CS) reviewed de identified data to ensure rigorous evaluation.

BENEFITS AND CHALLENGES ASSOCIATED WITH INSTRUMENTS
In this section we identify benefits and challenges of the in struments as we experienced them.

Instrument 1: Demographics and experience of Gippsland
The benefits of this approach included the ease with which data was collected. Participants readily shared their experi ences. Collated data was used in an early session of the pro gram ensuring personalized content. Participants appeared to value this approach, and it provided a platform to share both the highs and lows of living and working in Gippsland. By ex ploring positives and negatives, we conveyed to participants that we wanted to hear all views. Collection and analysis of data was relatively easy. There were no significant challenges with this instrument except ensuring that individual partici pants' personal experiences were not revealed without their permission. Sensitive questioning and prompting provided opportunities for further elaboration of relevant information from participants themselves.

Instrument 2: Baseline learning needs forms
There were several benefits to using this instrument. The most obvious was that participants were encouraged to think deeply about what they wanted to achieve. It also provided us with insight into participants' perceptions of what they thought GIPSIE might be able to address. Learning needs outside the scope of GIPSIE could be clarified at the outset, an important aspect of matching program objectives with participant ex pectations. Data was easily recorded. The principal challenge (or weakness) was the quality of the information participants provided. On the form, we gave examples of learning needs in order to illustrate how they might be described. Most partici pants then reported issues similar to the examples we provid ed. However, some participants provided additional examples and on questioning, the needs appeared genuine.

Instruments 3 & 8: MSF
Instruments 3 & 8 are same but taken at different times. The main benefit of this instrument at the time of program com mencement was that we were able to gain insight to the par ticipants as their colleagues perceived them. Additionally, it conveyed to the participants' colleagues that they were enrolled in a prescribing training program. Although the numerical ratings were interesting, the free text comments were often more helpful, especially when they were detailed. However, the process of collecting the data was highly resource inten sive. We collected the data prior to our personally meeting the program participants.
Some participants found it difficult to identify more than eight assessors because they had a relatively short work expo sure or worked in small organizations where they were not well known to colleagues outside their unit. The process of collecting the assessor forms required significant follow up and so a designated program manager was required. Despite these challenges, our response rates were satisfactory. Each participant had between two and eight returns at baseline with six the modal return number. After GIPSIE, each participant had five to nine MSF returns with six the modal value. Re spondents seemed highly engaged in supporting their IMG colleagues.
We asked IMGs to self assess using this instrument, which provided valuable insight to the participants as to how they viewed themselves in relation to their peers. Although there are issues associated with selfreport, participants found the process insightful and sometimes confronting. We had to en sure that participants were supported in making sense of this data, which again, was labourintensive but highly valued by participants as a learning experience.

Instruments 4 & 5: Workshop evaluation & End of session evaluations
The benefit of these evaluations was that we received imme diate insight into participants' experiences of the program. This helped us adjust subsequent learning objectives and educa tional methods. For example, in some educational methods, we needed to invest more time in orienting participants to their use (e.g., GIPSIE website questionnaires). Because the evaluation data was collected immediately after the partici pants' experiences, we were able to adjust subsequent teaching methods. The challenges were associated with participants re http://jeehp.org sponding uncritically. Given the relatively small number of GIPSIE participants, there may have been a reluctance to share true feelings, especially if they were critical of the program. We tried to ensure that the completion of forms was a private event and that forms were returned anonymously.

Instrument 6: Telephone interview
There were several benefits to this method, including the highly personalized nature of data collection. Although par ticipants might have provided what they considered were 'so cially desirable' responses, we felt reasonably confident that participants spoke quite freely. The participants seemed to be authentic in their responses. There were very few criticisms. They appreciated the attention and value we placed on their feedback. The challenges were again humanresource related because the interviews were time consuming and there were difficulties in scheduling. We also could not always be certain of where the participants chose to receive the telephone call. The settings might have impeded their freedom to share expe riences.

Instrument 7: GIPSIE website evaluation
Although we planned this collection of data, we did not use it in the final evaluation report. This was mainly associated with the 'remote' management of the GIPSIE website and the relatively small numbers of participants. That is, the website management was commissioned externally, and this seemed to create some communication challenges. Some participants were also very slow to start using the GIPSIE website and with the small cohort size, we were confident that the participants' selfreporting was adequate to meet our evaluation needs.

CONCLUSION
In order to evaluate the impact of a training program, a care fully planned and resourced strategy is essential. In health pro fessional training, our goal is to ultimately improve the health services offered to patients. However, their direct involvement in evaluation is challenging. Further, programs are often of fered by those distant to the workplaces of trainees. Ethical clearances make it difficult for the systematic collection of pa tient data.
In the project, we sought to implement an evaluation strategy that addressed most levels of the modified Kirkpatrick frame work.
Based on our experience, we make the following recom mendations: 1. Encourage broad stakeholder involvement in the devel opment of the strategy (e.g., inclusion of Gippslandbased IMGs and lay representatives). 2. Allocate adequate resourcing of administrative support, especially for MSF and booking telephone interviews. 3. Incorporate evaluation data into educational content and process. That is, schedule evaluation activities as part of the curriculum. Use data collected to engage participants in a personalized program while ensuring relevance. 4. If using MSF, then provide clear instructions to partici pants and assessors to minimize the encroachment on their time. Indicate that free text comments are highly valued if contextualized. Offer reassurance about confi dentiality to assessors. Offer reassurance to participants that the results will not be used in any way to influence their employment with their health service. 5. Incorporate participant feedback into ongoing program refinement and delivery to allow for personalization of education strategies as well as clarification of program objectives. 6. Ensure externally commissioned contractual work is clear ly articulated and include progress reports.

CONFLICT OF INTEREST
No potential conflict of interest relevant to this article was reported.

Instrument 4: Workshop evaluation
Please help to identify the strengths and weaknesses in this program by completing the following evaluation form.
To what extent did you meet the following learning objectives?
Not at all Completely Please add further comments here.