Introduction

For the neonatal transition from intrauterine life to extrauterine life to occur, a series of complex interrelated physiological adjustments must take place to change how key organ systems function.1,2 If any of these physiological adjustments is unsuccessful, the newborn may experience life-threatening consequences such as hypoxia and ischaemia.3 At present, sensors indicating the state of the crucial physiological and anatomical adjustments that support the neonatal transition are unavailable, minimal or compromised.1,2,3,4,5 Furthermore, monitoring devices currently used and recommended in contemporary resuscitation practice measure vital signs, but they do not convey the physiological and anatomical state of the newborn—the clinician must infer the state from the vital signs. As a result, the newborn is often viewed as a ‘black box’; clinicians cannot know for certain what is happening physiologically at any moment in the first 10 min after birth.1,2,3 However, such knowledge is vital not only for monitoring the infant but also for determining underlying conditions and appropriate treatment actions. Thus, until sensors capable of measuring specific physiological and anatomical parameters related to the neonatal transition are developed and integrated into practice, the challenge is to find the best approximation of the neonate’s true physiological state that could inform clinicians about physiological adjustments that are currently unobservable. We use the term “true physiological state” to describe the rapidly changing and interrelated state of the infant’s lung aeration, organ blood flows, fetal shunts and other variables, including both those visible and invisible to the resuscitation team.

To the best of our knowledge, a newborn’s true physiological state during transition has largely been explored using animal participants and highly invasive equipment.6,7,8 Where non-invasive measurements have been used on human infants, the newborns were delivered via caesarean section without transition complications and/or congenital malformations.9,10,11 Thus, it is unclear whether such equipment is robust in the resuscitation of a newborn with transition complications and/or congenital malformations. However, an effective first approximation to this information may be the similarity of expert neonatal clinicians’ inferences about what is happening on the basis of indirect information—a property that has been called ‘concordance’.12,13 For instance, during newborn resuscitation, clinicians can observe the outputs of monitoring instruments such as the neonate’s heart rate and oxygen saturation and clinical signs such as skin colour, tone and respiratory effort. If there is a high degree of concordance among expert clinicians in their interpretations of the signs, then analysts could use their combined interpretations as a suitable approximation of the neonate’s physiological and anatomical progress during the transition. Notably, the concordance of expert neonatal clinicians’ judgements, inferences, and decision-making during the newborn transition has never been explored. However, concordance-based approaches have been used in other clinical contexts where the true physiological state is unavailable, and to validate differential diagnoses provided by diagnostic support algorithms.14,15,16

The rationale for this study is if sufficient concordance can be achieved among expert clinicians’ interpretations of newborns’ vital signs during the transition, then this information—the diagnosis or diagnoses—could potentially substitute for the sensors that are currently unavailable, minimal or compromised, and inform the design of a cognitive aid. Cognitive aids are tools that guide a user through the performance of a task with the purpose of reducing task errors or omissions and improving speed and fluidity of performance.17,18 Cognitive aids can be in the form of cards, posters or desktop or hand-held computer-based help systems. A successful cognitive aid will improve the user’s comprehension of the situation and their selection of actions. Recent evidence indicates that cognitive aids designed for anaesthesia, acute care, emergency department resuscitation and neonatal resuscitation can improve clinicians’ task performance.19,20,21,22,23 However, few cognitive aids have suggested diagnoses to the end-user.24 Thus, evaluating the concordance of clinicians’ diagnoses as a potential indicator of the neonate’s physiological status during resuscitation is a crucial first step toward the design of a cognitive aid that could supplement existing resuscitation algorithms by suggesting diagnoses. The cognitive aid could help clinicians monitor and understand the physiological state of the infant, guiding their treatment decisions where the algorithms do not offer clear direction, as with atypical responses to resuscitation interventions or less common physiological conditions, including lung hypoplasia, congenital diaphragmatic hernia and congenital heart defects.

The primary purpose of this study was to explore whether there was sufficient concordance of experts’ diagnoses that they could act as a substitute for the currently unavailable direct measurement of crucial physiological and anatomical parameters in the neonatal transition.

Methods

Context and participants

This two-phase structured interview study was conducted with senior neonatologists from several tertiary institutions in Australia and New Zealand. In Phase 1, eight senior neonatal consultants were recruited from two tertiary hospitals in Brisbane, Queensland, Australia (n = 7 and n = 1, respectively). The participants in Phase 1 were recruited via a personal approach from one of the authors (H.L.) who is a senior neonatologist in a tertiary hospital in Brisbane, Queensland, Australia. All of the participants in Phase 1 regularly participated in newborn resuscitation and several participants provided newborn resuscitation training to registrars and midwives.

In Phase 2, 12 senior neonatal consultants were recruited from eight tertiary hospitals either via personal approach by the same author from Phase 1 or via two other neonatologists in different tertiary hospitals. The geographic distribution of participants was as follows: Queensland (three hospitals, n = 6); New South Wales (one hospital, n = 2); Victoria (n = 1); South Australia (n = 1); Western Australia (n = 1); and New Zealand (n = 1). All participants in Phase 2 were involved in newborn resuscitation research and practice and several participants were also involved in the development of neonatal resuscitation guidelines in Australia and New Zealand.

The study was given ethical approval by Mater Misericordiae Ltd (approval 53861) and by the Human Research Ethics Committee at The University of Queensland (approval 2019002697). Participants were required to have at least two years of experience as a senior fellow or consultant in newborn medicine. All participants provided written informed consent.

Design and procedure

The structured walkthrough interview technique is commonly used in human–computer interaction studies to elicit knowledge from subject matter experts engaged in complex tasks and it can be tailored for a specific project domain.25 In both Phase 1 and Phase 2, participants were presented with eight neonatal trajectories that the authors had extracted and presented as timeline graphics from the Dawson et al.26 database, which documents the real-time heart rate and oxygen saturation values of ~465 neonates in the first 10–15 min after birth. The 25th and 75th percentile boundaries were inferred by Dawson et al.26 in 2010 from the database of neonatal trajectories. Further information was also available for each neonatal trajectory, including gestation, birth weight, mode of delivery and supplemental oxygen. The trajectories chosen for the present studies reflected the most common patterns in neonatal transitions found in the database. During the interview, the voice and hands/cursor of the participant were captured on video to preserve the rich level of detail in the participant’s responses and to help the researcher subsequently transcribe the interviews.

Phase 1: The researcher conducted the structured walkthrough interviews in person and individually with each participant. The researcher first introduced the participant to the format of the neonatal trajectories and gave them a brief outline of the key interview questions.

The researcher showed the first neonatal trajectory on a laptop computer. Each trajectory was successively revealed in a series of snapshots, where each snapshot took the trajectory up to a potentially clinically meaningful moment in the resuscitation (see Fig. 1). As a result, the number of snapshots varied between the eight trajectories. At each snapshot, the remainder of the trajectory was occluded to avoid bias that might result if the participant could see the entire trajectory.

Fig. 1: An example of a neonatal trajectory in Phase 1 being gradually revealed in snapshots during the interview.
figure 1

The graph depicts several snapshots—each snapshot is defined by the boundaries of the red lines. At each snapshot, the researcher would administer the same interview questions, but in relation to the new portion of trajectory that had been revealed.

During the presentation of each snapshot, the researcher allowed the participant to freely articulate their thoughts, further stimulating the discussion by asking the participant key questions to elicit their opinion of potential underlying physiological diagnoses or physiological states at that point. The researcher did not suggest diagnoses or physiological states or offer any affirmation or contradiction of the participant’s suggestions. Below are some examples of the key questions used in the interview:

  • What is the concerning sign/s that you see in the trend at this point in time?

  • Given the gestational age of the neonate, what are the most likely underlying conditions?

  • For each condition/s, why are the sign/s representative of the condition/s you have identified?

The researcher repeated this process for the remaining snapshots within the neonatal trajectory. Once the entire trajectory had been revealed, the researcher asked the participant to provide a global differential diagnosis for the neonatal trajectory. The process outlined above was repeated for each neonatal trajectory. At the conclusion of the interview, the researcher provided an overview of the study’s rationale to the participant and informed them that a transcript of the interview would be provided within one or two days if they wished to review it.

Phase 2: The researcher again conducted structured walkthrough interviews individually with each participant, but the interviews were now exclusively online due to COVID-19 pandemic restrictions. The researcher first introduced the participant to the format of the neonatal trajectories and gave them a brief outline of the key interview questions. Seven of the newborn trajectories were the same across Phase 1 and Phase 2—the eighth trajectory from Phase 1 was replaced in Phase 2 because participants had inferred that no underlying physiological derangements were present.

A browser-based application presented trajectories to the participant at ten times real-time speed (see Fig. 2). In contrast to Phase 1, where snapshots were chosen in advance by the researcher, in Phase 2 the participant was asked to pause the progression of the trajectory when they recognised a pattern in the vital signs that were clinically meaningful to them. During the pause, the researcher first allowed the participant to articulate their thoughts, and where necessary, further stimulated the discussion by asking the participant the same key questions as in Phase 1 to elicit their opinion of potential underlying physiological diagnoses or physiological states at that point. The researcher did not suggest diagnoses or physiological states or offer any affirmation or contradiction of the participant’s suggestions. The participant and researcher repeated this process until the entire trajectory had been revealed. The researcher then asked the participant to provide a global differential diagnosis for the neonatal trajectory. The above process was repeated for all trajectories and the study was concluded as described for Phase 1. Figure 3 summarises the key procedural similarities and differences between the Phase 1 and Phase 2 interviews.

Fig. 2: An example of a neonatal trajectory in Phase 2 being gradually revealed by the browser-based application (10× real speed).
figure 2

The shaded portion of the trajectory depicts the participant’s first ‘pause’, where the researcher would administer the interview questions. The unshaded portion of the trajectory depicts the participant continuing to reveal the trajectory, anticipating a second ‘pause’.

Fig. 3: A high-level comparison of the procedural similarities and differences between Phase 1 and Phase 2 of the structured interview study.
figure 3

In Phase 1 the participants completed the interview in person, whereas in Phase 2 the participants completed the interview online. Furthermore, in Phase 1 the trajectories were presented to the participants as static snapshots, whereas in Phase 2 the trajectories were presented to the participants dynamically (in real time).

The selection of trajectories, the division of each trajectory into snapshots, and the construction of questions for the interviews were all completed with the input of a senior neonatologist (author H.L.).

Measures and data analysis method

The measures used were one or more of the differential diagnoses provided by each participant, in order of likelihood. For Phase 1 and Phase 2 separately, we analysed (1) the concordance among clinicians’ differential diagnoses within and between the snapshots of a neonatal trajectory and (2) the concordance among clinicians’ differential diagnoses when they reflected on the entire neonatal trajectory—the global differential diagnosis concordance. Because the study was largely qualitative, we could not apply typical quantitative measurements of agreement or concordance such as Cohen’s kappa or Fleiss’ kappa to the data without violating a number of their assumptions.27,28 Nevertheless, as described below, we aggregated the participants’ responses and interpreted them using Landis and Koch’s29 interpretation of the level of agreement (typically used for kappa scores).

Phase 1: For each snapshot of a trajectory, the number of participants who hypothesised that a specific underlying condition(s) could explain the pattern in vital signs was counted, then divided by the total number of participants and expressed as a percentage. The degree of concordance was divided into four categories based on Landis and Koch’s29 interpretations of the reliability of agreement:

  1. 1.

    Strong: >81% of participants hypothesised the same differential diagnosis for a given snapshot of the trajectory.

  2. 2.

    Substantial: 61–80% of participants hypothesised the same differential diagnosis for a given snapshot of the trajectory.

  3. 3.

    Moderate: 41–60% of participants hypothesised the same differential diagnosis for a given snapshot of the trajectory.

  4. 4.

    Weak: <40% of the participants hypothesised the same differential diagnosis for a given snapshot of the trajectory.

To explore the degree of agreement once the participants had seen the entire trajectory (global differential diagnosis concordance), the above categorisation scheme was applied.

Phase 2: Given that each participant could ‘pause’ the progression of a trajectory wherever they deemed appropriate, the researcher visually identified clusters of pause points for each neonatal trajectory (see Fig. 4) where >80% of participants identified and commented on the pattern. Once the clusters were determined, the researcher counted the number of participants who hypothesised each potential differential diagnosis for that particular cluster, then divided by the total number of participants who responded and expressed the final value as a percentage. The concordance of responses was mapped onto one of the four categories for the strength of concordance for each of the clusters and for the global differential diagnosis concordance.

Fig. 4: An example of visual clustering for a neonatal trajectory in Phase 2.
figure 4

Blue horizontal lines beneath the neonatal trajectory represent the ‘snapshot’ or meaningful pattern that participants were referring to when making diagnostic interpretations. Blue and yellow vertical lines beneath the neonatal trajectory represent the timestamp for the start of the ‘snapshot’ or meaningful pattern that participants were referring to when making interpretations. Clusters were identified via the following criteria: (1) ten or more participants overlapped on a start timestamp for a meaningful pattern (density of the vertical lines) and (2) ten or more participants overlapped on the snapshot or meaningful pattern driving their interpretation of the vital signs (length of the horizontal line).

Results

In Phase 1 and Phase 2, the concordance within and between the snapshots or clusters of each neonatal trajectory was variable. For example, Fig. 5 depicts a neonatal trajectory that produced moderate to substantial degrees of concordance in Phase 1 and Phase 2. Although participants in Phase 2 identified fewer meaningful patterns than had been predetermined by the researchers in Phase 1, there was still some degree of overlap in the pattern of vital signs that were discussed. As Fig. 5 shows, in Phase 1 participants consistently hypothesised that respiratory distress syndrome could be driving the pattern of the vitals. In contrast, participants in Phase 2 were more likely to identify symptoms of a definitive diagnosis, including not breathing and apnoea (which both relate to respiratory issues in the newborn). In both cases, levels of concordance were in the moderate to substantial ranges.

Fig. 5: The strength of concordance for neonatal trajectory #2.
figure 5

The trajectory produced a moderate-substantial degree of concordance among clinicians within/between each of the snapshots in the trajectory.

Figure 6 depicts a neonatal trajectory that produced a substantial to strong degree of concordance within each trajectory snapshot or cluster in Phase 1 and Phase 2 (further examples are in the Supplemental Materials). The degree of overlap in vital sign patterns identified and discussed in both interview phases was strong. Furthermore, in Phase 1 participants hypothesised that respiratory distress syndrome could be driving the vital sign patterns with a strong degree of agreement within and between each snapshot. However, although the agreement was strong within each snapshot in Phase 2, participants were inclined to attribute the vital sign patterns initially to extraneous influences, before agreeing strongly that the newborn was physiologically stressed (a consequence of respiratory distress).

Fig. 6: The strength of concordance for neonatal trajectory #4.
figure 6

The trajectory produced a subtantial-strong degree of concordance among clinicians within/between each of the snapshots in the trajectory.

Thus, although trajectories in Phase 1 and Phase 2 produced similar degrees of concordance within snapshots of a trajectory, there were differences between Phase 1 and Phase 2. Table 1 summarises the variability in the strength of concordance across the snapshots of each trajectory in Phase 1 and Phase 2.

Table 1 Summary of the strength of concordance across the snapshots of each trajectory in Phase 1 and Phase 2 of the interview study.

In contrast to the above, the global differential diagnosis concordance was stronger and more consistent across all trajectories in Phase 1 and Phase 2 (refer to Table 2). For example, in Fig. 5 there was substantial agreement that respiratory distress syndrome was the underlying physiological condition driving the pattern in the neonate’s vital signs in both Phase 1 and Phase 2. Similarly, in Fig. 6 all participants in Phase 1 and Phase 2 agreed that respiratory distress syndrome and intervention-related influences were the most likely diagnoses for the neonatal trajectory.

Table 2 Global differential diagnosis concordance among clinicians for each neonatal trajectory in Phase 1, Phase 2 and overall.

In summary, the global differential diagnosis data for each of the eight newborn trajectories produced a similar pattern of results for Phase 1 and Phase 2. Four trajectories produced a substantial degree of concordance among clinicians and four trajectories produced a strong degree of concordance among clinicians (see Table 2). Furthermore, of the seven trajectories used in both Phase 1 and Phase 2, two trajectories produced a substantial degree of concordance among clinicians in both Phase 1 and 2. Similarly, two trajectories produced a strong degree of concordance in both Phase 1 and 2 (see Table 2).

Discussion

The primary purpose of this study was to investigate the degree of concordance among expert neonatal clinicians’ interpretations of the vital sign patterns of neonates within the first 10 min after birth. If sufficient concordance is observed, then the interpretations could act as a substitute for the currently unavailable direct measurement of crucial physiological and anatomical parameters in the neonatal transition. We found that expert clinicians’ judgements of patterns in newborn vital signs appear mostly to agree. Furthermore, when clinicians consider the entire newborn transition, they produce a high-level interpretation of the potential physiological status of the infant with an even stronger degree of concordance. The implication is that differential diagnoses with a good degree of concordance could potentially substitute in part for the direct measurement of key physiological and anatomical variables, which is currently unavailable, and they could guide the design of a cognitive aid as well as supporting training.

These results provide support for theories of how expertise manifests in such complex and dynamic environments. Expert diagnosis depends on a person’s ability to perceive meaningful cues in the environment, such as a series of clinical signs co-occurring, that subsequently trigger associations in memory. The cues lead the person to certain expectancies concerning the behaviour of the system and to specific actions.30,31 Similarly, the meaningful patterns identified by the clinicians in our study could be driven by the repertoire of meaningful cues in memory, drawing clinicians’ attention to features of the newborn’s vital signs that are diagnostic of a particular underlying physiological condition. Consequently, clinicians can rapidly assess and appropriately respond to events in the transition.30

Both the literature and the present findings suggest that interpretive concordance among clinicians is an appropriate foundation for the design of a potential cognitive aid. A cognitive aid would guide the end-user towards expert clinicians’ interpretations (i.e. differential diagnoses) of patterns in newborn vital signs that signal important aspects of the physiological transition. A more accurate understanding by clinicians of the physiological status of the newborn at any point in the first 10 min after birth may help them make the most appropriate interventions—especially if the newborn’s physiological state differs substantially from the state that the resuscitation algorithms assume is ‘most likely’.

There is evidence from contexts such as emergency department resuscitation that a cognitive aid can improve clinicians’ speed and fluidity of diagnosis and treatment in resuscitation.17,19,21,23 A cognitive aid may also reduce the risk of errors of omission because it encourages the resuscitation team to consider and discuss differential diagnoses that they may have overlooked.17,18 Nevertheless, more research is needed (a) to determine the feasibility of relying on experts’ knowledge and opinion as a substitute for the direct measurement of crucial but unavailable physiological and anatomical parameters in the neonatal transition, (b) to create the physical form of the cognitive aid so that the presentation of information is most effective at the time and in the location that it is likely to be used and (c) to determine the impact of such a cognitive aid on the management of the infant in the first 10 min after birth.

Much healthcare research shows that machine learning algorithms can extract key features in physiological data to assist with diagnosis (see ref. 32 for a review). However, no such algorithm exists for the context of newborn resuscitation. Therefore, further research is needed to determine the practicality and feasibility of a newborn resuscitation cognitive aid supported by a real-time decision-support algorithm. First, the cognitive aid would need to be underpinned by an algorithm that can recognise—in real time—significant patterns anywhere in the first 10 min after birth and find an interpretation of the pattern. Second, the algorithm would need to take into account any interventions already administered to the newborn during the resuscitation and adjust the identification and interpretation of patterns appropriately.

This study has several limitations, despite the promising initial findings. Most of the limitations stem from the constraints related to (a) interviewing the specific expert population we were seeking, and (b) using a relatively constrained database of neonatal vital sign patterns in the first 10–15 min after birth.

First, we recruited a relatively small sample of experts in both phases of the study. Although the sample size is typical of recent human factors contributions to medical device design for neonatal resuscitation,33 a larger sample size may have improved the reliability of the findings. Nevertheless, the sample size was suitable for our goal of exploring concordance among experts’ responses.

Second, given the constraints on participants’ time, we could present only eight neonatal trajectories in each phase of the study—one trajectory for each kind of common neonatal transition pattern. Consequently, it was not possible to determine whether clinicians’ agreement would be consistent across many patterns of the same apparent kind. In Phase 1, the trajectories were not presented to the participant in real time, which may have compromised their ability to recognise meaningful patterns as they unfolded and interpret them. In addition, the trajectories were selected from a database that was collected in 2005,26 so their clinical courses may not be typical of infants resuscitated using contemporary guidelines. In future research, we will seek more recent neonatal resuscitation databases and present more than one example of commonly encountered trajectories closer to real time.

Third, the neonatal trajectories displayed very limited vital sign information—namely, heart rate, oxygen saturation, maximum percentage of oxygen administered, and pulse oximetry signal quality. At the time these recordings were collected, devices such as electrocardiogram monitors and respiratory function monitors were not in routine use. Apart from gestation, birth weight and mode of delivery, the database did not provide information about maternal and fetal history, other intervention administration, and clinical signs of the newborn. The lack of such information may have contributed to the variability in concordance observed within and between the snapshots of each neonatal trajectory. In future research, we hope to access more recent neonatal databases that capture all this information—and perhaps in video form as well as numerical. Nonetheless, the lack of information was partly an advantage. Our participants were not at risk of bias relating to presumptions from maternal or fetal history, but instead were making judgements based solely on objective measures available to them.

Fourth, we were unable to consider the baseline rate for each differential diagnosis. Participants were making judgements and inferences independently of one another, and like the investigators, were not aware of the true physiological state of the neonate. Even so, because some differential diagnoses are more likely than others, the degree of concordance that we found could be inflated. Next steps should be to redesign the interviews to satisfy the assumptions of more robust measures of inter-rater reliability, such as Fleiss’ kappa.28

Conclusions

We conducted an analysis of the concordance of expert neonatal clinicians’ interpretation of the patterns of vital signs exhibited by neonates within the first 10 min after birth. Our results showed a strong degree of agreement among neonatal experts. This is a small but important step towards using a majority or consensus of neonatal experts’ judgements in the design of a cognitive aid. To further validate this method, future research should seek more recent neonatal databases, present several examples of commonly encountered trajectories in close to real time, and utilise more formal measures of agreement. Nevertheless, this method could be used in other areas where clinicians do not have access to the true physiological state of the patient. Ultimately, the findings will help us design a cognitive aid to support clinicians’ management of the newborns who require resuscitation after birth—possibly enhancing the health of this vulnerable population.