Confounding factors in using upward feedback to assess the quality of medical training: a systematic review

Purpose: Upward feedback is becoming more widely used in medical training as a means of quality control. Multiple biases exist, thus the accuracy of upward feedback is debatable. This study aims to identify factors that could influence upward feedback, especially in medical training. Methods: A systematic review using a structured search strategy was performed. Thirty-five databases were searched. Results were reviewed and relevant abstracts were shortlisted. All studies in English, both medical and non-medical literature, were included. A simple pro-forma was used initially to identify the pertinent areas of upward feedback, so that a focused pro-forma could be designed for data extraction. Results: A total of 204 articles were reviewed. Most studies on upward feedback bias were evaluative studies and only covered Kirkpatrick level 1-reaction. Most studies evaluated trainers or training, were used for formative purposes and presented quantitative data. Accountability and confidentiality were the most common overt biases, whereas method of feedback was the most commonly implied bias within articles. Conclusion: Although different types of bias do exist, upward feedback does have a role in evaluating medical training. Accountability and confidentiality were the most common biases. Further research is required to evaluate which types of bias are associated with specific survey characteristics and which are potentially modifiable.


1.


Demograp
ics

More than 50% of the references were related to the medical profession (n = 109).Other professions that have commonly utilized upward feedback include teaching and education (n = 39), nursing (n = 22) and management (n = 18).The majority of references included postgraduate participants (n= 106).Thirteen references included both undergraduate and postgraduate participants.A large proportion of references were from North America (Fig. 2).


Types of studies and feedback

Studies were categorized according to the definitions in Table 2. Most references were evaluation studies (n = 176) and most studies were done for formative purposes (n = 172).A large majority of studies were quantitative (n = 152) and high proportion of studies used paper surveys as a means of evaluating upward feedback (n = 124).Most studies (n = 162) only covered Kirkpatrick level 1, reaction.The median response rate was 76%, the median number of participants was 198 and the median duration of the study was 6 months.Only 1/3 of references addressed the outcomes of their study by develop-   Overt bias would be explicitly mentioned by the authors within the study.Implied bias would be identified by the reviewer as potential bias but was not mentioned within the study.


Action plans

Did the authors address the outcomes/consequences of the article?Was an action plan

vised to addr
ss this? 16.Kirkpatrick levels Which level?[18] (1) Reaction: What do the raters think about their trainer/training/environment?(2) Learning: Was the ratee able to learn from this feedback?This can be identified through mechanisms such as feedback reports, receiving results.(3) Behavior: Did the ratee change their behavior due to this feedback?This can be reflected in repeat ratings.(4) Results: Was there any improvement in teaching after re

iving the feedback?Did others
enefit from this improvement?

For example, did exam rates improve?Did this change improve company profits?

ing an action plan.Furthermore, only 11 studies used controls to compare different interventions (Fig. 3).


Types of bias

Types of bias data separated into implied and overt bias.Implied bias involves factors that potentially could affect the upward feedback process but was not explicitly acknowledged within the article.Overt bias included factors affecting the up-ward feedback process that were mentioned within the article.A summary of the different types of bias found in this systematic review can be found in Table 3. Accountability and confidentiality were the most common biases recognized w

hin references
On the other hand, the method of feedback, which involves the type of survey, the location, the use and methodology of reminders and the duration, were most commonly implied within articles but not explicitly acknowledged (


DISCUSSION

This review shows that multiple sources of bias, in the important task of using feedback in the assessment of training quality, are already described.


Feedback philosophy

Although there has been extensive research on upward feedback within an undergraduate classroom setting [2][3][4][5][6][7]9,, the high proportion of references related to the medical profession and to postgraduate participants confirms the popularity of upward feedback in postgraduate medical training.The majori

used surve
s for formative purposes, which can provide the trainer/teacher with guidance on their current performance.The lack of studies for summative purposes c

ld be due to raters
ending to be over-lenient when upward feedback was for administrative purposes [14,17,39].However, in contrast, Smith and Fortunato [16] found that rating purpose did not affect intentions to provide honest ratings since raters Table 3. Different types of bias identified within the systematic review Type of bias Further information 1. Affect/leader-member relationship D efines the relationship between ratee and rater [57,134].The bias of liking someone may lead to potentially inaccurate ratings.


Motivation

L ow response rates may not be representative of the sampled population.This could potentially be due to lack of motivation.Prior interests, including prior subject interest [4,30] could also affect participation and enthusiasm.For example, did students volunteer themselves to enter into the study?A response rate of 60% or more is perceived as an acceptable level [208].Articles that explicitly mention rater motivations, enthusiasm or prior subject interests were also included.


Fear and retaliation, career progression

T he fear that honest ratings could lead to retaliation and affect career progression, could potentially affect upward feedback outcomes [12].4. Self efficacy, lack of understanding/knowledge of upward feedback, role appropriateness Do raters feel they are suitable/appropriate/confidence to rate their superiors [11,17]?

5. Cynicism and trust, perceived usefulness R aters may not feel their voice will be heard and may be skeptical that changes will be made according to their feedback [16].6. Ingratiation, yea saying, leniency, reward anticipation/incentives Raters may rate leniently as a means of showing ingratiation or to receive reward in return [11].

7. Method of feedback T his includes how survey was implemented e.g paper, online, the location of survey implementation [115], whether any reminders and method of reminders [55].Also included whether the survey was done over a period of time or only used 1 day/session [115].


Voluntary/compulsory

A ll members had to participate or could choose not to participate.9. Frequency/timing, opportunity to observe T he timing of the survey: Was it done straight after rotation, or done many months after ro ation, or done in the middle of the rotation [201].10.Cultural/gender C ultural differences may affect survey accuracy [78,119].Gender could affect survey differences e.g., nursing where the survey population is predominantly female [83].11.Halo effect R aters have a tendency t

give similar ratings
o all aspects of a survey [11,57].Raters are not able to differentiate between different traits.12. End aversion/extreme response End aversion: the avoidance of extreme ratings [11].Extreme response: always rating very high/very low scores [11].


Survey fatigue

If there are multiple surveys to complete in the study or if the survey was very long, then this could affect survey accuracy.


Survey purpose

Was the survey for administrative or developmental purposes [11,41]?Why was the survey done? 15.Others Potential biases that could also potentially affect bias but not mentioned above.e.g., recall bias [201].may use the purpose as a tool to retaliate and reward their supervisors.Upward feedback could potentially be used as a tool to develop clinical

ainers and to g
ve guidance to clinical educators on their own career plans [47].However, the effectiveness of upward feedback could be confounded by multiple factors, which will be discussed below.Most studies only evaluated Kirkpatrick level 1-reaction,

ich mos
ly involved surveying subordinate's views on certain topics.Only 10 studies covered Kirkpatrick level 4-outc

es [1,4,5,38,44,[48][
9][50][51][52].

The majority of studies did not address the consequences or results of the study.This could be because it may be difficult to develop specific action plans based on Kirkpatrick level-one evidence.Fur thermore, very few studies specifically compare the different factors or their effect on feedback quality.


Study administration

Upward feedback usually involves subordinates to appraising their superiors or training, hence it is not surprising that the majority of studies were evaluation studies.Only one study was a randomized controlled trial that stratified participants into 3 groups (online survey, simultaneous paper and online survey, sequential online and paper survey) [53].This study found that the sequential survey method, in which online and paper surveys were administered at different times, gave the highest response rate but increased costs [53].The small number of studies involving controls could be due to time and financial constraints.Controlled trials of educational interventions are rare, but more studies may need to include controls if we are to assess the efficacy of the different interventions.Without evidence for the effectiveness of interventions, it may be difficult for trainers to accept upward feedback from their subordinates.Tews and Tracey [49] showed that managers who participated either in self-coaching courses or upward feedback intervention, improved interpersonal scores compared to controls.Managers who participated in the upward feedback training scored higher overall [49].This could due to the fact that upward feedback, if utilized appropriately, can facilitate information sharing, act as a refresher in order to avoid complacency and promote further development of skills [48].Another form of support in upward feedback was the use of feedback reports, as demonstrated in Smither et al. [54]' s study.Feedback reports enabled managers to improve their managerial sk lls and also encouraged communication with their subordinates.However, adequate support with regular formal feedback in order to facilitate the process [48], may be difficult to orchestrate in medical training where clinical educators work shift patterns.Moreover, the costs of facilitating upward feedback support may be quite high.

It is only in recent years as the internet has become widely accessib e that online surveys have become more commonly utilized, hence why paper surveys were still the most commonly used form of feedback method within this review.Online surveys are cheaper and easier to administer in comparison to paper surveys and allow people to do the survey at a time that is convenient for them [55].Scott et al. [53]'s study showed that although doctors in training did not give the highest response rates

verall, trainee doctors gave the highe
t response rate when the survey was online.This may suggest the increasing role of online surveys in the newer generation of doctors.Furthermore, using online surveys to monitor training and trainers could allow the data to be more representative of the population of doctors in training.


Human factors in upward feedback bias

Affect describes the feeling of liking someone [56,57].It has been thought that affect can lead to leniency because it can prevent one's ability to objectively and rationally evaluate someone [58].Al-issa found that students gave higher ratings to teachers who they got along with [9].Moreover, Antonioni and Park showed that the leniency was more profound in both peer and upward feedback compared to downward feedback [56], suggesting that affect may play a role in both peer and upward feedback.In contrast, a study by Ryan et al. [59] found that recipients of feedback were more likely to accept feedback from those who they are already acquainted to and this finding was confirmed in another study [60].This could suggest that supervisors may be more accepting of honest feedback and this may encourage subordinates who have a positive relationship with their supervisors to give honest feedback.

Antonioni [61] found that participants who were not anonymous when they gave upward feedback did give higher ratings compared to anonymous participants.Furthermore, fewer participants stayed in the study after finding out they were in the group which could be identified [61].However, this study was implemented within an insurance company where upward feedback could potentially be for used for summative purposes.This could lead to greater inflation in order to minimize the negative consequences.In contrast, upward feedback in medical training is more likely to be for formative purposes in order to further develop the clinical educator.Many studies have allowed upward feedback response to be confidential due to potential rating inflation [3,4,7,12-15,17,22-24,26,28,34-39, 43-45,47-50,52-55,57,58,61-142], hence accountability and confidentiality was the most commonly acknowledged type of bias found within this systematic review.In contrast, Roch and McNall [67] that investigated whether anonymity affected ratings found that students who were not anonymous actually gave lower ratings compared to anonymous raters.Non-anonymous raters may feel more pressure to give high uality ratings [67].So, there still may be a role for surveys in which sub- ordinates may be accountable for their ratings.Furthermore, supervisors seem to be more accepting of accountable surveys [61].Unfortunately, in potentially negative situations, anonymity seems likely to be the best policy.

Reward anticipation could be related to evaluation inflation.Previous studies have found that course grades can significantly predict student ratings [7,9], but the causation is unclear.Marsh and Roche [30] found that giving high grades were not related to higher student evaluation, but instead a lot of the variation within student evaluations could be accounted for by prior subject interest, higher an challenging workloads and learning.Furthermore, Abrami et al. [6] found that student grades were unlikely to have an effect on student ratings.The relationship of reward and ratings has been inconsistent and can be subjected to interpretation, hence the need for further research in this area.

Even if confidentiality concerns are addressed, this may still affect participation due to fear and retaliation [10,12,15,61,62,132].The miscorrelation of self-perception and upward feedback results could affect acceptability and credibility of upward feedback since it threatens self-esteem [143].Multiple factors can affect people's receptivity to feedback, this includes their motivation, f ar and expectations [60].However, if feedback is delivered appropriately and is perceived as valuable, then this can minimise the risk of negative emotions and dismissal of the feedback [60].This is likely to require specialist input e.g., counseling which may have extra cost implications.

A lack of trust and cynicism was not an uncommon finding in both medical [45,53,55,58,137,142,[144][145][146][147] and non-medical feedback [5,9,15-17,21,26,38-40,52,61,67,70,71,75 81,82, 91,148-150].If there is discrepancy between self-ratings and upward feedback ratings [128,145], there is a possibility that the recipient may not find the feedback credible.Also poorly designed surveys that may lack useful feedback can lead to reluctance to change.Even trainees question the credibility of some of the feedback provided by their supervisors [151], hence it is likely that supervisors may do the same to feedback from trainees.Moreover, upward feedback, especially in an undergraduate setting has been compared to 'popularity contests.' Aleamoni [46]' s review article demonstrated that evidence supports the fact that students are able to judge

e effectiven
ss of teaching.However, attitudes are harder to modify and this misperception may still lead to faculty being more resistant to change.This resistance could in turn affect raters' enthusiasm, especially if previous experiences of upward feedback lead to no improvement.


Limitations

Although a comprehensive search was done, however, this may not be representative of all the data available on upward feedback.Also, a total of 35 articles shortlisted in the systematic review were not included in the results.There could potentially be other types of bias present in literature that was not reviewed within this systematic review.Moreover, we have identified a number of different biases that are in

lved in upw
rd feedback, however we have not investigated how these biases can be minimised.Further research will be required in order to determine whether these biases are interrelated and if it is possible to minimise the effects of different biases, especially human factors.


CONCLUSION

Upward feedback is a multidimensional form of feedback that can lead to improvement if facilitated and implemented appropriately.This systematic review has shown that multiple different types of bias can exist within upward feedback.The established literature acknowledges and suggests likely causes of bias, without thoroughly investigating their effect on feedback quality.This highlights the importance for managers of training to consider important factors such as survey method and intended uses when designing and interpreting feedback.Curre

ed approach with tria
gulation of methods seems to be the best way to evaluate medical training

urther research is requ
red in order to evaluate which typ