1 Introduction

In an Intensive Care Unit (ICU) context, patient bio-signals are continuously monitored and displayed towards recognizing alerting events. These recordings of physiologic waveforms along with data coming from laboratory examinations and patient interaction (medications, medical procedures) can be difficult to be interpreted especially in an environment as demanding as the ICU. Thus there is a need for analysing and displaying the data in an easy-to-understand manner. Real-time analysis of patients’ bio-signals can be used to detect conditions that precede medical complications using both domain expert knowledge and knowledge obtained by automated procedures [1].

The ventilation dy-synchronization is a known issue; a prime example are the incidences of ineffective triggering [2]. Ineffective triggering, (IT) is common, but factors affecting IT vary considerably and can be contributed to patient condition and factors related to the ventilation system. As a result, the IT frequency of appearances and overall distribution varies as well.

In the medical literature, the total time spent under ineffective effort has been proposed as an index related to adverse clinical events. However, these events are sometimes not equally distributed in time. Ineffective triggering of the ventilator is frequent but highly variable among patients and during the course of mechanical support for each patient. As previously reported [3], most patients have small (5 min) periods with high intensity of ineffective synchronization, i.e. ineffective efforts (IEs). It is still an open issue whether the cumulative effect of IE exposure during the total time in ventilation or the temporal patterns of IEs relate to patient deterioration.

The AEGLE [4] project was created by the need to provide the tools and the necessary data for physicians and researchers to explore new data and answer research questions. The datasets and tools described in this paper will enable clinicians to explore these new features and gain new insight in the related phenomena. In this work we propose a set of features that describe the morphology of IE event time-series and complement the IE index that might be proved helpful in better describing patients and estimating their hospital prognosis, and a web-based platform that is used to perform the analysis and evaluate the outcomes.

2 The AEGLE Project Approach

The design and implementation of the 1st AEGLE platform prototype is currently underway, parallel to that, analytic tools such as the ones presented in this paper, are also being designed. Having the data providers, domain experts and analytics developers in different sites can be a hindrance in the design of novel analytics, since direct communication and information flow is of outmost importance especially in the early stages.

The solution was to expose directly to the domain experts the in-development analytics via a web-based platform, so that a feedback loop between the involved parties will be created having as a result a more efficient designing phase.

For the purpose of developing the PVI analytics we chose R, an open source and powerful scripting language. For the last few years it has been among the most common used software packages in scholarly articles, and in 2015 it became the 2nd most used, while also being in fourth position in terms of usage growth in the same domain [5]. That makes R the most popular free analytic software, having as an added bonus its’ high flexibility in that it provides the users the ability to create custom analytics. At the same time, it benefits greatly from a large and active community that contributes to analytics and visualization libraries, and frameworks.

3 Methodology

3.1 Data Collection

For the purposes of this study recordings from 108 patients for multiple days were obtained from the ICU clinic of the University Hospital of Heraklion, Crete (PAGNI) using an experimental protocol; Patient-Ventilator Interaction (PVI) Monitor [6], as well as a selection of fields from the hospital Electronic Health Records (EHR). The raw data produced is approximately 12 Mb per patient per day as an average.

The data providers uploaded pseudo-anonymized data as files on a secure location. Later a more appropriate solution for datasets containing time-series was used by utilizing the NoSQL Apache Cassandra, although at this point it is implemented over workstation level resources.

3.2 Preprocessing

The main problem with the dataset that hinders a frequency based analysis is that the recordings are event driven, and thus the time difference between consecutives recordings varies, and in several cases significant. We applied a pre-processing tool that first utilizes general data cleaning methods and case specific rules (e.g. no breaths or attempts in breathing for 3 min is considered an artefact), and afterwards resamples the data to a fixed sampling rate of 30 s [7, 8].

Ineffective effort for more than 10 % of total breath count is believed to cause problems and prolong hospitalization time [2, 9]. We focused on the examination of periods in which the patient was experiencing serious troubles in breathing. We call such incidents IE events [7].

3.3 Feature Extraction for IE

A common used feature for the patient-ventilator interaction is the IE index.

$$ \frac{Ineffective\,Efforts}{Ineffective\,Efforts\, + \,Breaths}. $$
(1)

It is defined as the sum of IE to the combined sum of IE and breaths over a course of period, be it the totality of the recording, or smaller time segments (e.g. 1 h to 5 min). It gives us a general idea on how the patient faired over a period of time. It is reported that it IEs can lead to extended ventilation time, prolonged hospitalization and that it has an impact on mortality for patients with high IE index (>10 %) [2, 9]. A question that arises is whether or not a patient that might have a low overall IE index over the course of hours, has increased health risks because he is subject to short periods of intense IT activity. In such cases, even a much lower IE index might be related to patient deterioration. Also, a single feature might not be enough to sufficient describe the complexity of a signal distribution.

In order to address this issue, this paper suggests a set of indices that better describe the IE signal morphology. They can be divided into two categories:

  1. 1.

    Indices that are calculated based on IE signal morphology and are independent from the IE even definition (Table 1).

    Table 1. Indices of 1st category
  2. 2.

    Indices that are calculated based on the IE event definition (Table 2), and thus can vary depending on the researcher’s input.

    Table 2. Indices of 2nd category

3.4 Exposing ICU Analytics via Web-Based Platform

In order to make the developed tools accessible to physicians a custom web based ICU platform was created. The main goal was to present the analytic tools in a simplified way, hiding functionality from the user when the steps were already predefined while giving them the ability to parameterize tools when required. The platform supports two major functionalities (Fig. 1).

Fig. 1.
figure 1

Overview of platform architecture

The first functionality is a module regarding data pre-processing and processing. Currently only a PVI dataset module is in place. The physicians can upload to the database pseudo-anonymized raw data as extracted by the PVI monitor, then query raw datasets to be pre-processed as previously defined. Afterwards, the user can select either of the two analysis pipelines currently in place, the feature extraction and/or the correlation between the ventilator recorded signals and their phase delay with each-other estimation, based on wavelet coherence [7]. The wavelet coherence analysis explores the time-series correlation with each other and produces a set of features that are expected to give insight regarding the physiological phenomena that take place at the vicinity of an IE event, either preceding it (potential causes) or following it (potential consequences). On each case, the user has the option to experiment using different thresholds regarding the IE events, thus affecting the outcome of the analysis.

The features extracted by the ventilator dataset are combined with a segment of the patients’ clinical data that was retrieved offline from the hospital database.

The second functionality is to provide a set of analytics and exploratory visualization, offering a set of commonly used analytics. The platform is designed agnostic of the data type it is provided, although it requires them to be on a tabular format. The user can apply data cleaning methods (out of bound, missing values, etc.) either automatically or manually by the use of UI elements. There, the physicians can run the statistical analytic functions they select among the available ones and evaluate the clinical outcomes they choose. For this segment, R packages that implement well known and established algorithms were utilized.

3.5 Web-Based Platform Implementation

The web based platform was developed with Shiny, an R framework focused on creating web applications. An important part of the platform was the data visualization. In order to convey the information in a meaningful way based on the user ever-changing needs, we decided to focus on interactive visualization tools. Thus, we examined tools that are available as R packages. As it turned out there wasn’t a single package that could cover the entirety of our needs, so we choose the packages described in Table 3.

Table 3. Visualization Packages

Specifically, for the Google charts, a custom function was created for the visualization of an entire column based data frame, as individual columns and the relation with each other (Fig. 1). For an N column data frame, a single mega-chart was created that consists of \( N \times N \) individual charts. Each sub-chart \( C_{ij } \left( {i,j 1, \ldots ,N} \right) \) depicts the relationship between the variables residing in column i and column j of the data frame, with appropriate visualization that depends on the combination of the variable types. In the cases where \( i = = j \), a single variable is depicted.

4 Results

Both the platform and the algorithms are currently used by physicians at PAGNI, the functionality and the interface was evolved based on their input. The following two figures show instances of the platform running an exploratory statistical analysis (Fig. 2) and a data processing pipeline.

Fig. 2.
figure 2

Mass visualization of 5 variable and their combinations (3 numerical and 2 categorical). Red colored charts represent the diagonal (a single-variable chart) (Color figure online).

Based on analysis enabled by the platform, the optimal threshold for defining respiratory events that relate to adverse outcomes is found to be 10 IE per minute for a time span of at least 3 min. On the current state of this research, performing multivariate analysis, adjusting for age and severity while setting the significance levels at .05 had as results that:

  • three indices are found to be related to ICU mortality

  • four indices are found to be related to hospital mortality

  • two indices are found to be related to the number of days’ spent ventilation.

5 Discussion

As the data collection phase proceeds, there will be an increase to the amount of data available (Volume), by including additional equipment recordings such as ICU monitors (recording bio-signals such us Electrocardiogram with a much higher sampling rate than the PVI) and also by increasing the amount of patients whose data are recorded. Provided that the research questions answered by the AEGLE platform yields useful results regarding patient clinical outcome, our goal is to support streaming data and combine them with the acquired knowledge in order to build a decision support system that will require constant processing (the wavelet-based analysis previously described requires 2 to 10 s on workstation level resources depending on IE event duration) of live streaming data (Velocity). Either way, we are gradually stepping into Big Data territory and performance will be an issue.

In order to address this, and as the AEGLE project moves towards the integrated platform, we are shifting our focus on the analytic integration and optimisation, so we can take advantage of the distributed storage and distributed processing that the combination of RHadoop and SparkR frameworks provide.

The knowledge acquired from this analysis, powered by the platform, can generate alarms thresholds that might be eventually applied on a clinical level.