Observable phenomena that reveal medical students' clinical reasoning ability during expert assessment of their history taking: a qualitative study

Background During their clerkships, medical students are meant to expand their clinical reasoning skills during their patient encounters. Observation of these encounters could reveal important information on the students’ clinical reasoning abilities, especially during history taking. Methods A grounded theory approach was used to analyze what expert physicians apply as indicators in their assessment of medical students’ diagnostic reasoning abilities during history taking. Twelve randomly selected clinical encounter recordings of students at the end of the internal medicine clerkships were observed by six expert assessors, who were prompted to formulate their assessment criteria in a think-aloud procedure. These formulations were then analyzed to identify the common denominators and leading principles. Results The main indicators of clinical reasoning ability were abstracted from students’ observable acts during history taking in the encounter. These were: taking control, recognizing and responding to relevant information, specifying symptoms, asking specific questions that point to pathophysiological thinking, placing questions in a logical order, checking agreement with patients, summarizing and body language. In addition, patients’ acts and the course, result and efficiency of the conversation were identified as indicators of clinical reasoning, whereas context, using self as a reference, and emotion/feelings were identified by the clinicians as variables in their assessment of clinical reasoning. Conclusions In observing and assessing clinical reasoning during history taking by medical students, general and specific phenomena to be used as indicators for this process could be identified. These phenomena can be traced back to theories on the development and the process of clinical reasoning.


Background
Clinical reasoning is considered to be the foundation of patient care, and, as such, clinical reasoning is part of every endeavour clinicians make in clinical care, enabling them to take the best justified action in a specific context [8].
Theories on clinical reasoning all point out that this process starts with data gathering, which initially takes place during history taking [1]. The performance of history taking, therefore, could tell us a great deal about the clinicians' skill to start diagnostic reasoning, which is the initial phase of clinical reasoning. If this step fails, it is hard to see how an accurate diagnosis might be reached. Physical examination, of course, could also provide additional information for defining a proper differential diagnosis, but, in most clinical encounters, this is a separate process that follows the initial history taking.
Clinical reasoning as a concept emphasizes the cognitive process involved rather than its endpoint, the diagnosis [15]. To reach a diagnosis, a connection must be made between existing knowledge and data obtained. If this connection is lacking, this lack will be reflected in how the process is carried out as, in essence, one may gather a substantial quantity of data and yet have no idea what they mean.
Monitoring the acquisition of clinical reasoning skills by medical students and other trainees in the clinical domain is essential. Clinicians are familiar with monitoring their trainees' clinical reasoning in the workplace and are often readily able to identify those who are poor at it. The most important point here is that there are no unequivocal criteria for assessing students' clinical reasoning during a clinical encounter and that such assessment, therefore, is often informal, implicit, not standardized and could be based on holistic or subjective judgments. What happens in practice is that, after the encounter and the write up, the teacher asks for a differential diagnosis and the elements that lead up to it [1]. In terms of patient care, this could be a careful way of acting, but from an educational point of view, it might not be accurate enough. What is actually assessed in this case is the endpoint of the thought process that results in a clinical decision or clinical judgment. Before that endpoint is reached, however, the process of diagnostic reasoning takes place, which includes the cognitive processes leading to the endpoint. If it is only the outcome that is discussed, these cognitive processes go unnoticed and uncorrected [15].
Providing adequate and targeted feedback enables further development. To be able to provide feedback, teachers must observe and assess what students are doing. Providing feedback on clinical reasoning and assessing it, therefore, are intertwined elements of the same process. If this is not properly observed, therefore, feedback might be inadequately targeted and fall short of its intended purpose [9]. Many clinical exercises in which students take a history are ideally suited to formally assess the initial part of the clinical reasoning process, so-called diagnostic reasoning [11]. Although the importance of this procedure is well recognized, the assessment of clinical reasoning in these encounters in clinical practice still appears to be a major struggle for clinicians [4,5].
Identifying observable phenomena representing the cognitive processes of diagnostic reasoning would allow us to gauge students' skills during the process of clinical reasoning. This would give us the tools on which to base feedback and assessment, with the purpose of improving the development of these skills in students. The question then arises how we can reliably assess what is actually shown by students as an expression of diagnostic reasoning during history taking. To be able to answer this question, we should first know what can actually be observed regarding clinical reasoning in the learners' behaviour during history taking.
To assess competence in clinical reasoning, test situations should be as realistic as possible and should be based on sufficient numbers of clinical examples [17]. These two ingredients are abundantly present during clinical clerkships [1]. Most of the initial information needed for a proper differential diagnoses is acquired by careful and adequate history taking [7]. However, if diagnostic reasoning is to be observed, it must be deduced from students' observable behaviour [10], which could lead to large interrater variability. In addition, a clinical setting cannot be standardized, like case presentations in an OSCE, whilst it is known that even small changes in context or content can affect clinical reasoning performance [2]. This phenomenon, referred to as 'context or content specificity' , is often seen as a major obstacle to assessing medical students' clinical reasoning as it precludes generalizability. On the other hand, it is exactly the ever-changing contexts in daily medical practice that make clinical reasoning such a real challenge, even more so as the object of assessment [3]. Therefore, we need to define the phenomena and criteria that reflect the process as an observable action.
We hypothesize that it is possible for experts to recognize and identify phenomena and patterns that indicate proper diagnostic reasoning and that, by identifying these, valid assessment criteria for diagnostic and clinical reasoning in clinical practice could be developed. A necessary prerequisite for the use of such criteria is that they should be in accordance with what are considered to be the generally accepted concepts and theories of clinical reasoning.
Hence, our main research question is: Which observable phenomena serving as indicators for clinical reasoning do experts identify by observing students during history taking?

Setting and sampling
The study was conducted in our university medical center. The present curriculum is a six-year competencybased curriculum with a Bachelor's and a Master's programme. Clinical reasoning skills are taught throughout the Bachelor's programme. An internal medicine clerkship is part of the first year of the three-year Master's programme, which involves training in clinical practice. The internal medicine outpatient clinic has two consulting rooms that have been prepared with recording systems; these are used by students on a daily basis to record their consultations for training purposes.
Principal lecturers (PLs) participated in this study. The Radboud university medical center appoints excellent medical teachers who, as principal lecturers, play a leading role in teaching and developing medical education. All are practicing clinical physicians. The acquirement of PL status allows them to dedicate more time to developing medical education in all its facets. The department of internal medicine has seven principal lecturers among its faculty. One of the principal lecturers (CP) was too closely involved with the design of the study to be included as a potential participant. All other PLs, three males and three females, were invited to participate on a voluntary basis, to which all agreed.

Design
To be able to contribute theory formation about assessment of clinical reasoning, we chose a grounded theory methodology [16], which addresses knowledge gain through the construction of new theories that are grounded in a systematic analysis of the data.

Data collection
The PLs were invited to participate in one-hour research sessions. These sessions took place in a quiet room with only the researcher and the PL present. The six PLs studied two different recordings of medical students' history taking in a new patient encounter. These consultations by the students were recorded during the last week of their internal medicine clerkships. Prior to recording, patients were asked for their permission to use the recordings for educational purposes. Twelve recordings were randomly chosen from a larger number and were screened briefly to prevent inclusion of similar case presentations. The PLs were asked to watch the process and stop the recording when they thought they noticed the clear presence or absence of signs of overt clinical reasoning. At those points, the PLs were prompted to think aloud on what they observed and how they noticed that the cognitive process of clinical reasoning was active or lacking. Recordings could also be interrupted by the researcher when stimulation of the think-aloud process was considered necessary. The participants were not prompted to rate or judge students. During this sessions audio recordings of the think-aloud process were made for data analysis.

Analysis
Data analysis was started during the data collection process so that emerging themes could be explored in more depth with subsequent participants. The audio recordings were transcribed to written text for further analysis. Initial themes were explored (open coding), alternated with exploration of interrelationship amongst these themes (axial coding) to identify patterns and main categories in the data and stimulate conceptual understanding [16]. This process was carried out by the principal researcher (CH). Renaming of themes and determination of relationships between the themes were done by the principal researcher in collaboration with the research team, leading to the development of a conceptual framework. After analysis of all twelve interviews, data saturation was discussed and approved by the research group.

Validation
The emerging themes were presented and discussed with the participants to ensure that these resonated with their perceptions. Their vision was used to refine the conceptual framework.

Indicators for clinical reasoning
Whilst watching the recordings, the PLs saw general indicators for the process of clinical reasoning emerging at four different levels of the encounter. These were: I students' acts; II patients' acts; III the course of the conversation; and IV the result of the conversation and efficiency (Table 1). Different indicators could appear per level. All indicators emerged as themes in half to all of the interviews with the participants (Table 1).

Level I students' acts
Most comments on indicators of clinical reasoning were linked to the students' performance. In general, the students had to show that they were actually processing the information and were making choices during the conversation. The students were expected to show curiosity and play an active role in the conversation.
Taking the lead in the conversation Clinical reasoning was assessed by the way students took the lead in the conversation. Students had to show that they could direct the conversation towards the information they needed to know. This did not mean that clinical reasoning was assessed negatively when students allowed patients to tell their own story, but if students failed to actively take the lead at a certain moment, they did not show that clinical reasoning was taking place.
Recognizing and responding to relevant information Students had to show that they recognized relevant information by the way they responded to this information. If the expert assessors recognized an important clue in the patients' story and the students also responded to this clue with interest, the observer perceived that clinical reasoning was taking place. Not responding to clues that were marked as important by the expert assessors was interpreted as the absence of clinical reasoning or as wrong clinical reasoning.
Specifying symptoms When students had identified a leading health problem, the one that was worth pursuing, it had to be explored in depth. By showing eagerness and curiosity to specify symptoms that seemed important, students showed clinical reasoning according to the expert assessors. This included localization of the health problem and its evolution in time. Specifying symptoms could also be assessed negatively when their relevance was doubted by the expert assessors.
Asking specific questions pointing to pathophysiological thinking Students could ask questions that were not direct responses to what the patients said, but that were evidence of hypothesizing about specific causes of a health problem. The assessors might have recognized this hypothesis by the specific questions asked. Instead of asking specific questions pointing to a specific diagnosis, some students appeared to be just reproducing lists of questions they had learned to use. If they did this too often, a negative assessment was given for clinical reasoning.
Putting questions in a logical order Questions should be placed in a logical order. If the students failed to do so, the assessors did not recognize that the students were applying clinical reasoning.
Checking with patients Based on comments of the expert assessors, students should show that they were checking with patients to be sure that their clinical reasoning was going in the right direction. The assumption here is that appropriate clinical reasoning is only possible when it is based on correct data. Principal lecturers expect their students to check constantly whether they are gathering the right information. Not only did they have to ensure that they understood patients, but they also had to verify that patients understood them so as to be able to make differential diagnoses based on the correct information.
Summarizing Students are taught to summarize aloud when they have obtained a sizable quantity of information. Such summaries are used by the principal lecturers to judge clinical reasoning.
Body language Body language was also an element to be observed in clinical reasoning. For example, the assessors recognized confusion on the students' faces or in their postures.

Level II patients' acts
Assessors could interpret patients' acts as indicators for students' clinical reasoning when the students' clinical reasoning ability was reflected by the patients' acts. When patients felt that students did not understand the problem, patients could show a tendency to take over the leading role. This was interpreted as a negative sign for clinical reasoning. 'There's a misunderstanding between the doctor and the patient. The patient is pointing at her skin and mentions pain in her leg, but the student is talking about her knee problem.' 'By the way, we don't know now whether he understands it all, and they talk at the same level. He's talking about the period before Christmas, and he thinks she describes the symptoms she had when she was first using milk products. But I'm not sure whether she is talking about that period.' Students' acts as well as patients' acts influenced the course of the conversation. Some phenomena, or patterns, might appear that were caused by acts of either the students or the patients and that reflected the students' clinical reasoning ability. When students and patients were talking at cross purposes, data gathering was believed to be insufficient. Repetition either induced by students or patients could occur during the conversation and was considered a negative indicator for clinical reasoning.

Level IV data gathered and efficiency
The quantity and quality of gathered data as well as the speed of data gathering (efficiency) were regarded as indicators of clinical reasoning being in progress. If students took too long to get to an essential point, their clinical reasoning ability was assessed negatively. This was also mentioned for the quantity and quality of data that had been gathered by the end of the session. If, according to the assessors, students failed to obtain a sufficient quantity of data to make a proper differential diagnosis, their clinical reasoning was assessed negatively.

Performance assessment of clinical reasoning
The PLs were not asked to rate or assess students actively. However, pointing out the presence or absence of indicators for clinical reasoning during assessment of clinical reasoning often automatically resulted in assessment of clinical reasoning. By using a grounded theory approach, information about the assessment of competence in clinical reasoning emerged, which led to a conceptual framework (Fig. 1). Correcting for context factors, using self as a reference and undefined feelings and emotions were the three themes that influenced the assessment of competence in clinical reasoning by the expert assessors, as will be explained below.

Correcting for context factors
When expert assessors felt the need to give an opinion about the quality of clinical reasoning that was presented to them, they often corrected their assessment based on context factors. Context factors that could be derived from this study were: patient factors, disease factors, student factors and other external factors. When context factors appeared to make clinical reasoning more difficult, for example when the student gets interrupted by external factors, or when the patient is verbose, it affects the judgement of the assessors.

Using self as a reference
The assessment of the experts appeared largely to be influenced by choices they thought they would make themselves at a certain point in history taking. By taking themselves as a frame of reference, assessors compared the student performance with how they perceived themselves to practice.

Emotions/feelings
Although many observable phenomena serving as indicators for assessing clinical reasoning could be distinguished, not all observable phenomena could be put into words by the expert assessors. Often observations then were expressed as emotions or feelings. Assessors for example expressed that they could feel that clinical reasoning was happening.

Discussion
In this study, expert physiciansas expert raters of clinical reasoningidentified a set of indicators that capture basic elements of students' diagnostic reasoning during a clinical encounter, in particular history taking. These indicators can be used to assess this process in a structured and reliable way in the setting of the presumptive diagnostic reasoning early in the clinical encounter.
Although diagnostic reasoning can be viewed from different epistemological angles [4,5], in this context it is conceived as a cognitive process of drawing conclusions or making inferences from the facts or premises obtained early on in the process of clinical reasoning. By identifying the criteria by which this early stage of clinical reasoning can be observed and assessed, a tool can be developed specifically targeting this important point in medical students' clinical training.
The more one studies clinical experts, as some authors observe, the more one marvels at the complex and multidimensional components of knowledge and skill they bring to bear on the problem and the amazing adaptability they must possess to achieve the goal of effective care [12]. Fundamentally, this can be defined as a complex cognitive process that involves formal and informal thinking strategies to gather and analyze patient information, evaluate the significance of the information and weigh alternative actions [15]. This is a dynamic process; information that is initially discarded may be retrieved later in the process. As experience builds up and pattern recognition becomes an integral part of the process, this allows for faster procedures. Novices may be more limited to relying on prototypical or analytic knowledge and information processing, having fewer exemplars or experiences to draw upon [14].
This process of data handling, searching for clues and significance and gathering and analyzing data obtained, is reflected in the students' clinical diagnostic reasoning during history taking. This is the initial data gathering process. On the basis of these considerations, we could probably identify indicators of the process of clinical reasoning (or diagnostic reasoning, at this initial stage), regardless of the patients' context or the problem they present. These indicators are the students' observable actions to acquire meaningful information. The proper use or expression of these indicators could then enable us to standardize and thus generalize the way in which expert clinicians could judge medical students' clinical reasoning in this respect. This could provide clinical teachers with more reliable and structured clinical assessment and feedback methods to monitor the performance of clinical reasoning in the initial stages of clinical training.
It has been observed that the role of inferences is prominent in performance assessment [10]. Inferences are conclusions that come at the end of a chain of reasoning. We show that inferences can be based on recognition of the prespecified objective and, therefore, on more reliable criteria. As a result, such criteria and patterns can be recognized by different experts observing different students in different contexts. Others have also suggested that these inferences can in fact be markers of expertise, which represent the assessors' ability to tap into well-developed assessment schemes [18]. In our study, we have identified workable criteria to judge the process of information gathering and analysis that forms the basis of the medical students' clinicaldiagnosticreasoning process. These criteria, as formulated by the clinical experts, are in line with the existing concepts of the process of clinicaldiagnosticreasoning.
It is no surprise that the students' acts would play a major role in the observable indicators for clinical reasoning. When patients take the lead, however, the students' acts are also thought to be a reflection of their lacking clinical reasoning ability. Our main research question involved a search for common observable indicators for clinical reasoning, but our research method, involving a think-aloud process, apparently also stimulated performance assessment, even though the participants were not actively stimulated to rate or assess the students. In line with current literature [10], we found that expert assessors constantly incorporate context factors into their assessment and use frames of reference. When it comes to clinical reasoning, however, it is notable that expert assessors mainly appear to consider themselves as a reference. The extent to which assessors are capable to correct their judgement for context factors and to apply their own frame of reference to the student's performance might be the reflection of their expertise. In social perception theory, assessors or raters can be seen as social perceivers who use motivated social assessments when evaluating performance and use preexisting knowledge structures or schemes when assessing performance. These schemes are based on the expectation that assessors evaluate the students' behaviour in a certain social position, in specific social situations or as persons [6]. Hoping to improve the assessment of clinical reasoning in practice, we feel that research should also focus on how these internal frames of reference or internal standards for assessing clinical performance develop. We need to improve our understanding of the underlying processes that influence the development of internal standards that are used for assessing performance. This knowledge could be helpful in training performance assessment and could serve as a basis for personalized feedback in the professional training of students and residents.
Ultimately, all medical students need to be assessed on whether they are evolving satisfactorily as medical professionals, an essential part of which is the development of appropriate clinical reasoning skills. One student failing to show clinical reasoning ability was strikingly described by one of our expert assessors as 'someone who is engaged in social talk but not in professional exchange'. In this study, we could outline indicators that experts can use to distinguish clinical reasoning from 'social talk'. This might be a very useful supplement for clinicians to assess and stimulate the improvement of their trainees' clinical reasoning ability. Our next step is to asses validity and reliability of these indicators when they are used to assess students'clinical reasoning ability in clinical practice.

Limitations
Though the number of participants was limited, all themes emerged from half to all of the participants, and no new themes emerged; data saturation, therefore, was considered complete. We are aware that cultural differences might have influenced the way in which assessment of clinical reasoning was addressed by expert assessors. The conceptual framework, therefore, could be enriched by repeating the study in other countries or other medical disciplines.
There were large variations in the number of times that themes appeared during different sessions. This could be due to variety amongst the participants as well as variety among the presented cases. We are well aware that assessment only by observing the students has limitations. The students' explicit thought process during the clinical encounter was not an integral part of the assessment. Therefore, we do not advocate assessment of clinical reasoning from this perspective only. Obviously, observation also takes precious time. Different methods could complement each other. Direct observation, however, is not always part of daily routine as it should be and takes place only rarely during most clerkships [13].
The study was performed in Dutch but reported in the English language; some nuances, for example in quotes, might have been lost.

Conclusions
Expert assessors use general indicators while assessing students' clinical reasoning during history taking. As such, these indicators could satisfy the need for a common way of assessing clinical reasoning in a practical setting such as an outpatient clinic.

PL: Principal lecturer
Availability of data and materials The data on which the conclusions of this manuscript rely are available by contacting the corresponding author.
Authors' contributions CH participated in the design of the study, data acquisition, data analysis and manuscript drafting. JM participated in the design of the study and manuscript revision. BC and PG participated in the design of the study and the data validation session. CP participated in the design of the study, data acquisition, study coordination and manuscript revision. All authors read and approved the final manuscript. Ethics approval and consent to participate Approval of the Radboud university medical center institutional ethical review board was not required according to the guidelines of the institutional review board. Participation was voluntary.

Consent for publication
Not applicable.

Competing interests
The authors declare that they have no competing interests.

Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.