Introduction

Gough (1953) conducted probably one of the earliest investigations of the role of personality in medical selection and education. Undoubtedly, his aim, to “contribute to a broader understanding of some of the non-intellective factors relating to academic achievement” (p. 361), still reflects the traditional approach to personality in medical selection. That is: (1) to identify traits based on their ability to predict a future outcome and (2) to select people on these traits, usually selecting higher scores (Ferguson et al. 2002). However, recent advances in personality theory question the validity of this tradition with clear implications for medical selection practices. We address some of the myths/misunderstandings about personality that form the basis of the traditional model and suggest ways forward.

Developments in the conceptualization of personality

Myth/misunderstanding 1: traits as stable deterministic predictors

Whereas it is a widely held belief that traits are stable deterministic predictors of behaviour (see Roberts 2009, for an overview), current thinking from ecology (Dingemanse et al. 2010), economics (Ferguson et al. 2011) and personality psychology (Roberts and Jackson 2008) suggests the opposite. Traits are conceived as dynamically linked to context and biology, allowing for traits not only to change but also to influence the expression of behaviour (trait expression) across contexts.

Change and stability

Evidence demonstrates that traits change across generations (Twenge 2000, 2009), and the life course (Roberts et al. 2006; Caspi et al. 2005; Roberts and Mroczek 2008 for reviews) as well as a function of proximal events: (1) environmental exposure such as university, work, unemployment (Ludtke et al. 2011; Robins et al. 2005; Roberts et al. 2013; Boyce et al. 2015), (2) training (Jackson et al. 2012), and (3) therapeutic and psychological interventions to promote change (Tang et al. 2009; Hudson and Fraley 2015). The size of these changes can be substantial. It is of the same magnitude of economic markers of change like income and wealth (Boyce et al. 2013). The degree of personality change is itself subject to individual differences (Ludtke et al. 2011) and influenced by beliefs (e.g., about the changeability of personality, Robins et al. 2005).

Models such as the socio-genomic model of human personality aim to explain personality change within generations by highlighting the dynamic interaction between biology, environment, states (emotions, cognition, and beliefs), and traits (Roberts and Jackson 2008). According to this model, personality change is brought about by the influence of both (1) the environment on the organism’s biology as well as (2) states acting to mediate the link between biology and trait change. Similarly, both evo-devo models (a synthesis of evolutionary and developmental biology, whereby in some cases developmental ‘biases’ the adaptive quality of a phenotype: see Laland et al. 2011, p. 1514 and Toth and Robinson 2007) and niche construction theories (whereby organisms act as ‘eco-engineers’ and adapt and change their environments in such a way that the evolutionary gains are passed on: see Odling-Smee et al. 2013) indicate that traits change both within and across generations (Laland et al. 2011). These models suggest that traits lead organisms to both select environments to operate in as well as to change/shape these environments. All of which leads to trait change. Indeed selection, shaping, and socialization (traits change as a function of experiencing environments) have been observed for human personality (Ludtke et al. 2011; Wrzus et al. 2016; Caspi et al. 2005). There is also evidence for the “corresponsive principle”, whereby the trait that drives selection into an environment is the trait that changes the most from the selected environment (Roberts et al. 2013). As Caspi et al. (2005) succinctly put it “… the most likely effect of life experience on personality development is to deepen the characteristics that lead people to those experiences in the first place … For example, if people assume more leadership positions because they are more dominant, then they will become more dominant through their experience as leaders.” (p. 470).

A hybrid model, combining these approaches (Fig. 1), indicates that traits, as well as being influenced by the environment via biology, also influence environments across developmental time—we choose the environments we operate in, change them and our traits consequently change as a function of these experiences. Certain trait changes that increase fitness may be selected for and survive to the next generation.

Fig. 1
figure 1

Hybrid socio-genomic model of personality and evo-devo/niche construction model adapted and developed from Laland et al. (2011) and Roberts and Jackson (2008).The dotted lines represent the sociogenomic model (some of which is already in the evo-devo models). Development/change of a trait is influenced directly via biology (including gene expression). States (cognitions, emotions beliefs) also have an indirect influence on trait change via biology and behaviour and reciprocal links to the trait itself. The environment influence the type of behaviour expressed with the trait influencing this directly and via states. The trait can influence the selection and shaping of future environments. Socialization will then influence trait behavioural expression and the trait itself via environmental constraints and state processes that will influence the interpretation of behaviour in a given context. These developmental changes in personality may be selected for (if sexually desirable, or functionally adaptive for survival) and survive to the next generation

Context specific prediction

Whereas the developments above deal with changes in people’s personality traits across the life span, within generations and to context, other developments emphasize how the expression of behaviours associated with traits (trait expression) changes within individuals across situations. Roberts (2009) suggests that traits represent a function that determines the probability that a person will act in a particular way given a particular context. This resonates with behavioural reaction norms (BRNs) used to conceptualize personality in ecology (Dingemanse et al. 2010). A BRN assesses trait relevant behaviours at multiple time points across a context that varies along some graded parameter (e.g., stress: contextual gradient). The mean value across context indicates how the organism typically behaves (personality) and variance/covariance across context denotes how the organism adapts and changes (plasticity).Footnote 1 Fleeson et al. (see Fleeson 2004; Fleeson and Law 2015) applied a similar approach to human personality: the density-distribution approach. Here multiple assessments of trait relevant behaviours (e.g., for conscientiousness: ‘During the last half hour, how hardworking have you been?) are collected across contexts. Both personality and plasticity were observed (see Fig. 2). Furthermore, typical behaviour on 1 day predicted typical behaviour the next day (stability). In relation to assessment with traditional personality scales, Fleeson et al. showed that the distribution shifts to the right for those who score high (+1 SD), compared to low (−1 SD), on a standard personality scale (see Fig. 3). So, for example, high conscientious people will still express some low conscientiousness at times. Finally, scores on traditional personality scales were positively correlated with the mean and extreme of the distribution. Thus, individuals have a stable mean (typical behaviour), that is correlated with their score on a personality test, based on their own distribution of trait expression. In sum, personality reflects both typical behaviour, stability and plasticity across contexts, and traditional personality inventories pick up only typical behavioural tendencies but not plasticity.

Fig. 2
figure 2

Phenotype and phenotypic expression

Fig. 3
figure 3

Phenotype and phenotypic expression as a function of high and low trait scores

Implications for medical selection research

At present, there are no data on how personality changes as a function of medical training. Such an examination is important for predicting adaptability of medical students and physicians to a career in medicine. If medical training changes levels of conscientiousness for example, does it make the trainee/physician more or less conscientious? Changes to lower conscientiousness have been linked to poorer health (Human et al. 2012; Jokela et al. 2014) which will ultimately affect practitioner performance and the predictive validity of conscientiousness. The traditional approach does not consider personality change (Ferguson 2013), but it is clear that personality change needs to be part of any predictive model.

Trait expression and the context specific prediction from traits also implies that -depending on context—people high on a trait, such as conscientiousness, may be more or less likely to express that behaviour. For example, conscientiousness may be a good predictor in a context where high conscientiousness is expressible (e.g., MCQ exam, OSCE) but not so much in context where it is not (e.g., patient interaction). Thus, the predictive validity of the trait is context dependent and we need to not only know about differential prediction across the medical career but also how this influences and is influenced by personality change.

Myth/misunderstanding 2: the dark and bright sides

There is an assumption that some traits are “good” (conscientiousness) and others are “bad” (neuroticism). However, recent findings show that “good” traits have a “dark-side” and “bad” traits a “bright-side” (Boyce et al. 2010; Ferguson et al. 2014). For example, neuroticism has anxiety as a one of its facets, which brings benefits in terms of increased vigilance to danger, but carries a cost: negative reactions to stress (see Nettle 2006 for an evolutionary explanation). Developing this idea further, Widiger and Mullins-Sweatt (2008) suggest the Big 5 traits may be related to behaviours that are associated with being either maladaptive or normal, with these further divided into high and low. Thus, maladaptive high conscientiousness (ridged, single minded), maladaptive low conscientiousness (careless, sloppy) and normal low conscientiousness (disorganized) would all hinder good performance, whereas normal high conscientiousness (organized, resourceful) would enhance it. This suggests that a curvilinear function links the trait to performance. For example, too little conscientiousness and too much conscientiousness may be problematic, whereas just enough conscientiousness (the “Goldilocks’ hypothesis”) is the optimal solution (Martin and Keyes 2015).

Implications for medical selection research

The implication is that “good” traits that are typically seen as key to select in medicine, like conscientiousness, may carry a dark-side. Indeed, there is evidence for this in medicine (Ferguson et al. 2003, 2014). Ferguson et al. (2014) showed that conscientiousness was a negative predictor of clinical knowledge, but that a little anxiety aided clinical skills development. This idea can be extended to a trait like empathy, which is seen as a good trait to select on in medicine because it indicates a positive value and virtue required in medical staff. However, there are costs associated with empathy in terms of susceptibility to psychological illness, reduced pain thresholds, and psychopathology (see Ferguson 2016). Thus, while undoubtedly virtue and values, are key to a caring medical system, without considering the potential dark-side of empathy, and testing for it, selecting such a ‘good’ trait may be too simplistic and even counterproductive. Indeed, recent evidence shows that empathy in carers may carry costs (see Manczak et al. 2016). Furthermore, the Goldilocks’ hypothesis suggests that selection should focus on the optimal level of trait. The issue is how to identify that optimal level.Footnote 2

Developments in personality measurement: some solutions for medical selection practice

Our review above suggests that personality assessment should allow at the very least for some assessment of ‘contextual sensitivities’ or ‘trait expression’. In addition, there should be attention to both bright and dark sides of personality traits. Do new developments in personality assessments offer some immediate solution to these conceptual issues?

Contextualized measures of personality

Contextualized personality inventories typically add a tag to existing generic items. Examples of such tags are “at-work” or “at-school” (Lievens et al. 2008). A recent meta-analysis in the employment domain (Shaffer and Postlethwaite 2012) revealed that adding such simple tags to a generic personality scale substantially raised the validity of the personality scores for predicting job performance.

Another approach is to focus on context specific motivations linked to a trait. This approach to contextualization is exemplified by the Five Individual Reaction Norms Inventory (FIRNI) “…which conceptualizes the Five-Factor Model dimensions as stable individual differences in people’s motivational reactions to circumscribed classes of environmental stimuli” (Denissen and Penke 2008: p. 1297). For example the FIRNI item “When I am acting on a plan I do not easily let myself be distracted by short-term needs” could be re-written as “When dealing with patients I would not easily let myself be distracted by short-term needs.”

Implications for medical selection research

It may seem straightforward to add contextualized tags such as “when dealing with patients” to traits like conscientiousness (Jackson et al. 2010). However, these tags need to be carefully considered. For example, what contexts should be tagged? Also applicants (unless having had extensive work experience) would not be able to say how they would typically react in such a context (“When dealing with patients I am a very competent person”). Last, it should be noted that the specificity of the tags used should probably mirror the outcome measures that one wants to predict. That is, general outcome measures (life satisfaction, well-being) are best predicted with traditional generic personality inventories, whereas the opposite is true for more narrow and fine-grained outcome measures.

Trait expression

Situational judgment tests (SJTs) offer even more possibilities for measuring personality in relation to situations. Situational judgment tests (SJTs) present realistic, job-related situations and ask participants to indicate what should be done to handle each situation effectively (McDaniel et al. 2007). Due to their predictive validity and diversity benefits SJTs have made inroads in medical education and selection (e.g., Libbrecht et al. 2014; Lievens and Patterson 2011).

A new development consists of using SJTs designed to infer people’s standing on personality (Motowidlo et al. 2006; Lievens and Motowidlo 2015). Such SJTs present a situation that activates a specific personality trait (e.g., agreeableness) and then lists response options that differ in terms of their level of agreeableness. The underlying logic is that people who score high on agreeableness will be better able to discriminate between the different options because they posses more accurate beliefs (referred to as implicit trait policies, Motowidlo et al. 2006), of what an effective agreeable reaction is in that given situation. Thus, implicit trait policies capture procedural knowledge about the behaviour linked to the trait that would be effective in that context.

Implications for medical selection research

So far, SJTs assessing implicit trait policies have not been adopted in medical education (but see Ghosh et al. 2015). Such contextualized personality approaches, however, are promising ones that deserve more recognition and applications in medical education.

Other-reports and implicit personality measures

Whereas the two aforementioned assessment approaches link to some of the key theoretical advancements highlighted in this paper, two other methods (other-reports and implicit personality measures) could compliment these. Over the last decade, these two measures have received substantial research attention in the personality and personnel selection domains but not so in the medical education domain.

Other-reports of someone’s personality, by a person well acquainted to the target across a diversity of situations, sheds light on someone’s reputation (Hogan and Shelton 1998). Other-reports are typically obtained with the same inventories as self-reports and two meta-analyses showed that other-reports of personality are valid and substantially add incremental predictive validity over self-reports (Connelly and Ones 2010; Oh et al. 2011). The evidence reviewed above shows that self-reported personality can change in response to environmental contingencies and events. If an individual’s self-reported personality changes with respect to environmental factors (e.g., life events, exam stress), the question is whether their well acquainted peers are able to pick this up and indicate similar changes in the targets personality, or is the change not necessarily apparent to observers. It may be that some people are able to manage their expression of personality relevant behaviour change so acquaintances do not pick it up whereas others do not. Difference between peer and self-reports of personality, with respect to change, may provide an index of reputation and/or impression management. At present, there is limited data on how changes in other-reports link to changes in self-reported personality (Jackson et al. 2009).

The key difference between implicit and explicit measures of personality is that the latter inventories (e.g., NEO-PI) ask someone directly to describe their personality, whereas the former infer someone’s personality. For example, people are asked to make associations, complete sentences or arrange pictures and their responses are used to infer their personality (Uhlmann et al. 2012). Within a dual-system framework, implicit measures tap intuitive and fast processing, whereas traditional explicit measures assess slow and deliberative processing (Strack and Deutsch 2004). As such, implicit and explicit personality measures are tapping different processes. Indeed, the association between implicit and explicit measures of the Big 5 are low, but generally strongest for neuroticism, extraversion, and conscientiousness (see Grimm and von Collani 2007; Schmukle et al. 2008; Steffens and Konig 2006; Vianello et al. 2013). Implicit personality measurement has the potential to add extra information over and above explicit personality measures (Back et al. 2007; Lang et al. 2012; Uhlmann et al. 2012), predict spontaneous behaviours (Steffens and Konig 2006), and are less susceptible to faking good (Vecchione et al. 2014). We need to know the extent to which such implicit measures also show changes across and within generation and are influenced by context.

Implications for medical selection research

In medical selection, other-reports based on Five-Factor Model inventories have not been used. Yet, they could be operationalized through references and letters of recommendation that are structured based on the Five-Factor Model of personality (for an example in personnel selection, see Taylor et al. 2004).

With respect to implicit measures, the main concern is often their practicability in actual high-stakes selection, as they involve lengthy assessments of computer based reaction times. However, one specific type of implicit measures, namely conditional reasoning tests, might offer a solution (Berry et al. 2010). Here test-takers are presented with situations and possible responses, which are interwoven with traditional reasoning items. They are asked to select the response that follows most logically from the situation. In a conditional reasoning test of aggression, for instance, the four responses refer to different justification mechanisms, with some of them indicating more aggressive tendencies of the endorser, with repeated endorsements of such response options indicative of aggressiveness. Reviews show that the predictive validity of conditional reasoning tests of aggression is similar to the one of self-report personality measures (Berry et al. 2010) and that they are not susceptible to faking good unless people are told that the test measures aggressiveness instead of conditional reasoning (LeBreton et al. 2007). So, under these circumstances, conditional reasoning tests might be additions to the existing tests used in medical admissions.

Another way of assessing implicit personality is to develop a technique introduced by Quirin and colleagues (see Quirin and Bode 2014; Quirin et al. 2009). In this procedure participants indicate the extent to which words from an artificial language (e.g., Tunba) sound like a series of mood adjectives (happy, energetic, helpless, etc.). This could be easily developed via Goldberg’s adjective markers for the Big 5 (Goldberg 1992). So far, however, we do not know of any colleges using these tests in medical selection.

Predictive validity issues and trait change

First and foremost the above shows that stable trait expression, as indexed by the score on a personality scale, can change as function of context. At present, there are no data on how personality changes as a function of medical education. This is a fundamental gap in our knowledge as the degree and extent of behaviour change has profound implications for predictive validity. If the level of a trait changes across medical training and career, as function of that training, then at the very least multiple assessments of the trait are needed to help to untangle cause and effect as training performance (TR), at any one point, will be a function of the initial trait score t n plus the trait assessment proximal to the performance estimate (\(t_{n + 1} ),\) and the previous training environment (\(TR_{n - 1 } )\). Previous training is used to model the ‘learning back-bone’ along which trait will have their effect (Ferguson et al. 2014). Furthermore, the level of the trait at (\(t_{n + 1} )\) will depend not only on the initial trait level (t n ) but also the effect of any previous training environment (\(TR_{n - 1 } \)). This results in the simple model in Fig. 4. Thus, predictive the validity coefficients can be expressed as (t n ) and (t n+1) (these can be used to control for initial levels and residual change) controlling for previous training. Thus we can ask if (1) trait scores at selection (t n ) predict across the training backbone equally (generalized predictive validity), or if the strength or direction changes, (2) if proximal assessments (\(t_{n + 1} )\) have stronger predictive power than distal ones or (3) if it is the degree of trait change that matters (if so it becomes important to identify predictors of change).

Fig. 4
figure 4

Structural model for incorporating personality change into predictive validity

Epilogue

There are three big gaps at this time in the literature of medical selection with respect to personality: (1) lack of data on personality change and its implications for healthcare professional’s health and performance, (2) the lack of assessment of trait expression and context sensitivity, and (3) no real recognition of the ‘bright’ and ‘dark’ sides of personality and again its implications for healthcare professionals’ performance. The intensity and nature of medical training is very likely to result in personality change (Ludtke et al. 2011), also the changing medical training context is going to mean that ‘dark-side’ and ‘bright-side’ aspects of traits will have particular roles to play with respect to specific aspects of training (Ferguson et al. 2014).

The overall implication is that the traditional model of “selecting on a trait that has overall positive predictive validity” is itself of limited validity in the face of dynamic trait change and context specificity of trait expression. At a minimum, if the traditional approach is used, then analyses should be conducted to explore if the trait has any ‘dark-sides’ in the context of medical selection. We also need to know how traits change in this context. If change is not factored into the predictive models, then these models may well artificially under-or over-estimate validity coefficients. Also, by knowing which traits change and for whom we will be in a position to develop much stronger and more precise predictive models. For example, we might select the level of the trait that shows the least change but the highest overall predictive value. Or we might identify those for whom greatest trait change is likely. If this change is likely to be detrimental interventions to help can be put into place (e.g., treating personality change and management as part of medical education as well as selection, Jackson et al. 2012; Hudson and Fraley 2015).

We also make a few recommendations concerning how current available personality testing approaches may allow, to a certain extent, for some of these conceptual issues to be incorporated. Our recommendations focus in particular around context specificity (SJT, FIRNI, etc.) and following them up will enhance the medical selection process. However, that said, the big questions of trait change, expression, and the dark-side still need to be explored.