Variations among Electronic Health Record and Physiologic Streaming Vital Signs for Use in Predictive Algorithms in Pediatric Severe Sepsis

Abstract Objective  This study sought to describe the similarities and differences among physiologic streaming vital signs (PSVSs) and electronic health record (EHR)-documented vital signs (EVSs) in pediatric sepsis. Methods  In this retrospective cohort study, we identified sepsis patients admitted to the pediatric intensive care unit. We compared PSVS and EVS measures of heart rate (HR), respiratory rate, oxyhemoglobin saturation, and blood pressure (BP) across domains of completeness, concordance, plausibility, and currency. Results  We report 1,095 epochs comprising vital sign data from 541 unique patients. While counts of PSVS measurements per epoch were substantially higher, increased missingness was observed compared with EVS. Concordance was highest among HR and lowest among BP measurements, with bias present in all measures. Percent of time above or below defined plausibility cutoffs significantly differed by measure. All EVS measures demonstrated a mean delay from time recorded at the patient to EHR entry. Conclusion  We measured differences between vital sign sources across all data domains. Bias direction differed by measure, possibly related to bedside monitor measurement artifact. Plausibility differences may reflect the more granular nature of PSVS which can be critical in illness detection. Delays in EVS measure currency may impact real-time decision support systems. Technical limitations increased missingness in PSVS measures and reflect the importance of systems monitoring for data continuity. Both PSVS and EVS have advantages and disadvantages that must be weighed when making use of vital signs in decision support systems or as covariates in retrospective analyses.


Introduction
Sepsis is a leading cause of mortality among children admitted to the pediatric intensive care unit (PICU). 1 Early detection of sepsis is critical, as the diagnosis is often delayed in the PICU 2 leading to delayed therapy and worse outcomes. 38][9] Additionally, retrospective identification of sepsis patients through computable phenotypes provides important datasets for researchers. 10,11Notably a key feature in nearly all of these models is patient vital signs, including heart rate (HR), respiratory rate (RR), oxyhemoglobin saturation (SpO 2 ), and blood pressure (BP).The prediction accuracy of such models depends on the accuracy of recorded vital signs.
The widespread implementation of electronic health records (EHRs) has enabled electronic vital sign documentation through manual entry or verification from integrated patient monitors. 124][15] Physiologic streaming vital signs (PSVSs) from patient bedside monitors offer an alternative to EHR-recorded vital signs (EVSs), providing more granular data in near-real time. 16,17[20] Objective Accurate vital sign measurement is essential to the detection and prediction of clinical deterioration in pediatric sepsis as well as in retrospective identification of sepsis patients.The goal of this study is to describe the similarities and differences among streaming vital signs (PSVS) versus documented vital signs (EVS) in PICU patients with sepsis.We hypothesize that PSVS and EVS will differ in the measured domains, based on known technical limitations of PSVS data storage as well as the entry and validation procedures required for EVS.

Methods
We designed a retrospective cohort study including patients admitted to a 55-bed single-center academic PICU from July 2017 through December 2020.This study was granted an exemption (# 19-016133, initial approval 5/24/2019, addendum approval 12/16/2020) by the Institutional Review Board at the Children's Hospital of Philadelphia.
We identified eligible encounters as those with a PICU admission and subsequent PICU discharge during the study period and an ICD-9 (International Classification of Diseases, 9th Revision) diagnosis of severe sepsis (995.92) or septic shock (785.52),as recorded in this site's Virtual Pediatric Systems (VPS) quality improvement database by trained data-entry nurses.These diagnosis codes have previously been shown to accurately identify sepsis encounters. 21We extracted admission-discharge-transfer (ADT) events from the EHR (Epic Systems, Verona, Wisconsin, United States) database.We defined an ADT epoch as a bed assignment in the PICU with bounding ADT entry and exit events, within a single patient encounter.Epochs were excluded when the epoch duration was less than 1 hour or when the location was not a PICU bed (e.g., perioperative).
We collected demographics, associated diagnoses, length of stay, and patient outcome from our institutional VPS database.We extracted EVS for the variables including HR, RR, SpO 2 , and BP, measured by noninvasive methods (noninvasive [cuff] BP [NBP]) or invasive arterial monitoring (arterial line [transduced] BP [ABP]).EVSs were either manually entered or integrated into the EHR from the bedside monitor network and then individually validated by the bedside nurse.PSVSs were broadcast from General Electric (GE) Solar 8000 bedside monitors on the GE CareScape network and captured by a Nuvon (Capsule Technologies, Andover, Massachusetts, United States) IDM 4000 interface, transmitted to an IBM InfoSphere stream server and stored in an IBM Hive database approximately every 5 seconds.In this study, we do not make use of continuous (waveform) physiologic signals, which increase data granularity at the expense of acquisition and storage complexity.Nonphysiologic ("implausible") values were excluded from the dataset.Timestamps were assumed to be synchronous compared with EHR timestamps.We mapped EHR beds to corresponding bedside monitors and extracted the same vital sign variables as above for all available timestamps in each ADT epoch.
We analyzed ADT epochs with paired EVS and PSVS data across the domains of completeness, concordance, plausibility, and currency. 22We report counts of vital signs by source, as well as median per epoch.We assessed completeness by measuring the number of hours of the ADT epoch and calculating the percent of hours without a vital sign recorded.Because NBP values are measured on the patient intermittently but bedside monitors continue to broadcast values continuously, we filtered the PSVS dataset to remove all NBP values which occurred >1 hour after they were measured on the patient (e.g., if the last measured NBP occurred at 10:15 a. impact real-time decision support systems.Technical limitations increased missingness in PSVS measures and reflect the importance of systems monitoring for data continuity.Both PSVS and EVS have advantages and disadvantages that must be weighed when making use of vital signs in decision support systems or as covariates in retrospective analyses. m., we removed all NBP values which streamed from 11:15 a. m. until a new NBP was measured on the patient).This duration was chosen as it was the most common measurement interval for NBP.
To measure concordance between the two vital sign sources, we calculated the numeric distribution of the PSVS values within AE 20 minutes of each EVS recorded time and calculated the distribution mean and variance.We projected the EVS value onto this distribution and calculated its z-score position on the distribution.A positive z-score indicated that the EVS was larger (higher) than the mean of the PSVS distribution ("over-estimating"), while a negative zscore indicated that the EVS was smaller (lower) than the mean of the PSVS distribution ("under-estimating").
We defined the plausibility of a vital sign value as whether the vital sign makes sense in light of what it is measuring.We applied age-specific cutoffs taken from a published consensus-derived sepsis pathway 23 which were derived from Goldstein et al. 24 The SpO 2 cutoff was fixed at 90% for all ages.We measured the total duration, or time until next vital sign measurement, and percent of time that vital signs fell above (or below) the consensus cutoff values.Based on our clinical population of interest, we considered time below SpO 2 and ABP cutoffs, and time above HR and RR cutoffs.We calculated the currency of EVS as the difference between the entry time and the recorded time.A positive difference indicated that the vital sign was entered in the EHR ("entry time") after the time it was recorded from the patient ("recorded time"), whereas a negative currency indicates it was entered into the EHR prior to the time it was supposedly recorded from the patient.PSVSs had no measurable delay from bedside monitor to database storage and were considered current.
Descriptive statistics are reported as mean (standard deviation, SD) or median (interquartile range, IQR).Distributions are compared using t-tests and ANOVA (analysis of variance) when normality is assumed, or Wilcoxon signed rank and Kruskal-Wallis test when not normally distributed.Unless otherwise stated, a value of p <0.05 is considered significant.All analyses occurred in R studio 25 and made use of the packages odbc, 26 dbplyr, 27 dplyr, 28 ggplot2, 29 cowplot, 30 and knitr. 31

Results
A total of 611 PICU encounters met inclusion criteria (4.5% of all PICU encounters during the study period), which included 541 unique patients and 1,095 ADT epochs.After removing epochs less than 60 minutes' duration and department locations not in the PICU, our resulting analysis included 808 ADT epochs from 608 PICU encounters representing 539 unique patients (►Fig. 1).The median duration of these ADT epochs was 112 hours (IQR: 31 to 305).Patient and encounter demographics and outcomes are shown in ►Table 1.

Completeness
The count of PSVS measurements per included ADT epoch was substantially higher (> 400-fold) compared with EVS measurements (►Table 2).However, substantial missingness was observed across all measures from PSVS, including 125 ADT epochs (

Plausibility
Time above (or below) vital sign cutoffs was calculated for each filtered ADT epoch with PSVS or EVS (N ¼ 678) and normalized to the total time each measure was available, as this time was different by measure for each epoch (see the Completeness section above and ►Fig.2C).Percent of time above or below cutoffs was significantly different by source for each measure, as well as significantly different by measure (►Fig.2D

Currency
All measures demonstrated a positive time difference between entry time and recorded time, indicating a delay in entering these values into the EHR.The magnitude of delay, in minutes, was significantly different by measure (Kruskal-Wallis rank sum, p < 0.001) and posthoc pairwise testing demonstrated significant differences in time difference among all pairs except for SpO 2 (6 minutes [1-22]) and HR (6 minutes [1-22]).The greatest median time difference was NBP (9 minutes, [1-27]) and least median times were SpO 2 and HR.All measures also included values with negative time differences, indicating the values were entered into the EHR prior to the documented recorded time from the patient or that the documented recorded time was inaccurate, with the greatest percentage in the NBP (15.1%) and the lowest percent from the ABP (7.7%).Density distribution of time difference for representative measure (HR) is shown in ►Supplementary Fig. S2 (available in the online version).

Discussion
In this study, we identified significant differences between vital sign sources across physiologic measures, suggesting that each source has advantages and disadvantages that must be considered when used for prospective or retrospective analyses.Concordance was highest among HR measurements and lowest among BP measurements, though bias existed across all measures.Plausibility significantly differed among measures and is critical for vital sign use in either prospective decision support or Fig. 1 Flowchart of PICU encounters and ADT epochs.PICU encounters were filtered by "Severe Sepsis or Septic Shock" inclusion criteria and ADT epochs were extracted from these encounters.Subsequently ADT epochs were filtered to include only those >1 hour and located in a PICU bed.Lastly, several ADT epochs contained zero physiologic streaming vital signs recorded during the epoch.ADT, admission-dischargetransfer; PICU, pediatric intensive care unit.
retrospective studies.All EVS measures demonstrated a mean delay from time recorded at the patient to EHR entry.Although these results may be expected, to our knowledge this is the first study to systematically compare these measures across data domains in critically ill children.
In our institution, PSVSs are captured at the bedside and displayed on monitors, both in the patient's room and at a central unit location.Monitor alarms are triggered from these measures based on parameters established by clinicians and nurses, and typically transmit to nursing communication devices.Parameterized vital signs stream to our EHR regularly and are edited or verified by nurses, who may also document independent of these suggested vital signs.Based on this flow into the EHR, we hypothesized that EVS would be highly concordant with PSVS.Although measures were similar, we did identify bias in each measure.These biases may reflect "smoothing," as described by others in adult patients, 32,33 where manually charted data were less likely to reflect abnormal values.Interestingly, the direction of these differences varied by measure, with EHR-recorded SpO 2 tending to raise the value while HR, RR, and ABP tended to lower the value.This is perhaps related to measurement artifact on SpO 2 waveforms which tend to artifactually lower SpO 2 , or similar ECG-derived respiration measures which can artifactually raise HR and RR (e.g., some forms of chest physiotherapy can artifactually mimic ventricular tachycardia on three-lead ECG measurements).Agreement was best (highest r 2 value) for HR, which indicates it may be the measure least affected by artifact.Because deterioration indices or other decision support relies on vital signs to trigger warnings about patients in the intensive care unit (ICU), developers must understand each data source's own strengths, limitations, and biases. 33n this population of sepsis patients, we chose to examine plausibility not simply on whether or not a value is likely to be true, but rather based on the vital sign's relationship to predefined age-adjusted sepsis vital sign criteria.This definition was chosen to mimic detection of "abnormal" vital signs by a decision support system, one anticipated use of these vital sign sources.We hypothesized that this cohort would be enriched for values which suggest patients meet these rudimentary sepsis vital sign criteria.Although concordance was similar, plausibility percentages were different between sources across all vital sign measures.These differences could be explained by the more granular nature of PSVS, whereby patients' vital signs may ebb and flow within an hour between EVS (which may demonstrate relative concordance in the time surrounding that recorded vital sign).This ebb and flow is critical to both illness detection such as sepsis surveillance 8 and to retrospective studies using "time above" or "time below" a priori cutoff as exposure variables.For example, the use of granular PSVS to measure hypotension as opposed to EVS may provide stronger evidence for minimizing systolic hypotension in postarrest care. 34In the future, even more granular data sources, such as continuous waveform data, may further improve illness detection but may require advanced analysis techniques such as machine learning.It may not be surprising that HR and RR were the measures which most often fell above the vital sign cutoff in this patient population.This may be due to elevated HR in sepsis patients.Additionally, the chosen cutoffs, while age-based, were clustered into only five separate age groups.As discussed above, ECG-derived respiration may over-estimate compared with charted RR and may contribute to the increase in percent above plausible values for this measure.
Previous studies have demonstrated errors in manual documentation of vital signs from a variety of reasons. 14,15,35allioinen et al described five sources of inaccuracy related to adult RR measurement including awareness effect (similar to the Hawthorne effect), observation methods, observer variability, value bias, and recording omission. 15Similarly, Skyttberg et al identified low completeness and poor currency in EHR-documented emergency department vital signs and concluded that these were insufficient for early warning system usage. 14Our results also demonstrate deficiencies in completeness, though these were predominantly in PSVS and likely due to technical glitches (see below); EVSs were nearly perfectly complete with the exception of BP measures which were dependent on invasive monitoring placement.
This study confirms the poor currency of EVS suggested in other studies, with a median 6-to 9-minute delay though with the upper quartile experiencing >20-minute delay in documentation.Skyttberg et al 36 have examined factors related to these currency delays and proposed a series of steps for vital sign quality improvement in the emergency room, some of which are applicable to the ICU environment (e.g., provide workflow support and perform quality control).Ultimately, a delay in data entry presents challenges for the delivery of real-time CDS in an acute care environment.
In addition to the limitation of this being a single-center study, we observed technical limitations which caused substantial missingness in our PSVS measures.Unfortunately, because no business operations systems were making use of the streaming parameterized vital signs, long periods of data missingness went unnoticed and resulted in gaps in stored data.Reviews of these disruptions suggested that buffering capabilities in the messaging interface was responsible for the initial failure, followed by lack of "monitoring of monitors" systems to notify the appropriate supports.These failures have been mitigated in the most recent 6 months' worth of PSVS data.Fortunately, the missingness was random with respect to patients, rooms, and time of year, so should not have systematically biased these results.These challenges suggest, similar to automated anomaly detection recommended in CDS implementations, 37 that institutions should plan automated streaming monitor systems when using PSVS in production clinical environments.

Conclusion
Vital signs are critical features of prospective CDS systems and covariates in retrospective analyses.Both PSVS and EVS have advantages and disadvantages, and the trade-off between decreasing bias and increasing currency must be weighed against the possibility of artifactual plausibility issues and technical limitations related to completeness.Real-time CDS implementations should consider cross-domain retrospective analyses and comparisons across data sources and measures, in addition to accounting for source anomaly detection.

Clinical Relevance Statement
Vital signs provide critical objective data for patient care in the intensive care unit and beyond, and are also features in clinical decision support tools such as early warning systems or best-practice alerts.Accurate, timely, and plausible representation of patient vital signs is necessary for these tools to operate appropriately.In this study, we demonstrate the advantages and disadvantages of physiologic streaming and EHR-recorded vital signs.

Protection of Human and Animal Subjects
This study was granted an exemption (# 19-016133, initial approval 5/24/2019, addendum approval 12/16/ 2020) by the Institutional Review Board at the Children's Hospital of Philadelphia.Note that the y-axis was terminated at 80% to better highlight box differences, though whiskers and points for some measures extend beyond 80%.All measures were significantly different between sources (pair-wise Wilcox signed rank, p 0.01) and among source (Kruskal-Wallis, p < 0.001).ADT, admissiondischarge-transfer; EHR, electronic health record.

Fig. 2
Fig.2Comparison of physiologic streaming and EHR vital signs.(A) Percent of all hours in ADT epochs with missing vital sign measures, by source.(B) Violin plots with inset box-whisker plots of z-score concordance between EHR and streaming vital signs for each measure.Positive zscore indicates EHR vital sign is larger than physiologic streaming vital sign mean value, while a negative z-score indicates EHR vital sign is smaller than physiologic streaming vital sign mean value.(C) Sample time series plot of physiologic streaming vital sign (small black dots) with median filter smoothed line (blue) and EHR vital signs (large red diamonds).Shaded gray represents the time at which the physiologic streaming vital sign values are above the age-specific cutoff, shown in light blue horizontal line.No EHR vital signs shown were above age-specific cutoff.(D) Boxwhisker plots of the percent of time above (or below) age-specific vital sign thresholds, per vital sign measure and source.Note that the y-axis was terminated at 80% to better highlight box differences, though whiskers and points for some measures extend beyond 80%.All measures were significantly different between sources (pair-wise Wilcox signed rank, p 0.01) and among source (Kruskal-Wallis, p < 0.001).ADT, admissiondischarge-transfer; EHR, electronic health record.

Fig. 3
Fig. 3 Scatter plots of physiologic streaming versus EHR-recorded vital sign concordance by measure.Each point represents the mean of the physiologic streaming vital signs from the AE 20 minutes surrounding each EHR-recorded vital sign.Contours (red) are overlaid to highlight the distribution of overlapping points.ABP, arterial blood pressure; HR, heart rate; RR, respiratory rate; SpO 2 , peripheral oxygen saturation.
15.5% of total) which had zero PSVS recorded, compared with no epochs with zero EVS recorded.The number of ADT epochs which were completely missing PSVS measurements was significantly different by measure (HR: 126 [15.6%],RR: 131 [16.2%],ABP measurements, these values are only present in patients with functioning arterial catheters and are not expected in all patients or epochs.For subsequent analyses, we filtered to remove ADT epochs which were completely missing vital signs for each measure and source.Percent missingness, filtered for those not completely missing vital signs, was significantly different between sources for all measures, analyzed by the Wilcoxon rank sum test (p < 0.001 for all except ABP, where p ¼ 0.03).

Table 1
Patient and encounter demographic information, outcomes, and common associated diagnoses Number and percent of PICU encounters which contain this active diagnosis in the VPS database.Subsets of diagnoses were selected as most frequent and representative of the admitted population.
a Categories provided by virtual pediatric system (VPS).bPediatric risk of mortality (PRISM) score calculated over the first 12 hours of admission, where higher scores are associated with longer length of stay and greater risk of mortality.38c

Table 2
Vital sign counts by measure and source (EHR vs. physiologic streaming) : ABP, arterial blood pressure; EHR, electronic health record; HR, heart rate; IQR, interquartile range; NBP, noninvasive (cuff) blood pressure; RR, respiratory rate; SpO 2 , peripheral oxygen saturation.Note: Summaries per ADT epoch include only those meeting ADT epoch inclusion criteria (>1 hour and from a PICU bed, N ¼ 808). Abbreviations