Infection prediction using physiological and social data in social environments

https://doi.org/10.1016/j.ipm.2020.102213Get rights and content

Highlights

  • Infection diagnosis is critical in social environments; otherwise diseases may spread fast.

  • Clinical decision support system was developed for infection diagnosis in nursing homes using small data; AUROC of 0.734 achieved.

  • In small data environments, external sources of data can improve diagnosis performance with little cost.

  • When considering additional data (social data and weather information), AUROC increases up to 0.798.

  • For diagnosis, smaller leads (historical data) is preferred, but larger leads improve the results when adding social data.

Abstract

The ability of detecting infections at an early stage in clinical environments is an important clinical problem. When an infection is not diagnosed on time, it may not only affect the health of the infected patient, but also spread and infect other people.

In this paper, we propose the development of a clinical decision support system (CDSS) for diagnosing infections using clinical signals from patients. This system is designed to be able to cope with small amounts of data (a single record per day and patient), making it convenient for environments under strict constraints (such as low resources or bad connectivity). Additionally, we have incorporated data from external sources, in order to enrich the quality of the models. In particular, we have considered social data arising from web searches, retrieved from Google Trends, as well as weather data.

Clinical data was recorded between April 2018 and July 2019 in two nursing homes in Spain and one in Dominican Republic, where nurses had also tested patients for infections. Feature extraction was carried out by aggregating measurements from days before to the infection (lead) and after the infection was detected (lag), and these features were used to train supervised learning models. The best model attained using only clinical data attains an AUROC of 0.734. When data is enriched with external sources, this performance increases up to an AUROC of 0.798. In the case of prognosis (i.e., only measurements before the manual annotation of the infection are used) an AUROC of 0.719 is obtained using only clinical data, and up to 0.757 when combining additional sources of data.

In conclusion, the CDSS provides a good recognition performance given the small amounts of data available. This performance can be increased by including social data, which are readily available, and can therefore be useful in scenarios where clinical data acquisition is expensive or unfeasible.

Introduction

By the term “infection” we understand a broad variety of diseases that are the product of the invasion and multiplication of harmful microorganisms in the body, such as bacteria, fungi or viruses. In general, infections can begin in any organ of the body, and they might spread rapidly across it. Because an infection can be of many different types, the consequences of an infectious process can also vary greatly depending on many different factors. Such of these factors are the specific harmful microorganisms involved in the disease, the body organs affected by it, the subject’s health status prior to acquiring the infection, the availability of vaccines, etc. A good example of this variability can be found in influenza: it affects millions of people every year, but in most of the cases the consequences are not severe. However, in some extreme cases it can even cause death.

The variability of infections is not only found in their outcome, but also in the way in which they manifest. Some infectious diseases can remain asymptomatic and have no important consequences on a subject’s health. Some infections might not show any symptom during the early stages and then begin to display some gradually. Again, how the disease manifest throughout a subject’s body will both depend on the infection itself, but also in the subject’s characteristics and even in environmental factors.

While infectious diseases can start in many different settings, in some cases they can appear while staying in a hospital. In fact, hospital-acquired infections (which are also known by the name of nosocomial infections) are pervasive both in developed and developing countries (Khan, Baig, Mehboob, 2017, Ogwang, Paramatti, Molteni, Ochola, Okello, Ortiz Salgado, Kayanja, Greco, Kizza, Gondoni, Okot, Praticò, Granata, Filia, Kellar Ayugi, Greco, 2013, Petersen, Holm, Pedersen, Lassen, Pedersen, 2010). In some cases, the severity of hospital-acquired infections can be very high, mostly due to the conditions occurring in such a context. The most threatening fact is that patients in a hospital have often a worse health status than average, either because they are elderly, people suffering from other medical conditions, perhaps with a weakened immune system or even people who are developing resistance to antibiotics.

Besides hospitals, infections can also start in other clinical settings. For example, studies have also been conducted regarding infections taking place in nursing homes (Montoya & Mody, 2011). This kind of environments can be particularly prone to infections because of the way in which interactions take place: unlike hospitals, nursing homes often promote social engagement between patients. Thus, they can be often involved in activities such as watching TV with other patients, playing card, having lunch or drinks in a common room, etc. In such an environment, an infection can spread fast if patients at risk of contagion are not isolated at due time.

In this context, the ability to provide an early diagnosis of an infectious disease can be of the utmost importance. On one side, patients can be treated properly and timely by administrating antibiotics or providing the care required to overcome the disease. On the other, an early detection of the infectious process allows to isolate patients at risk of contagion in order to avoid fast spread of the disease. This is extremely important given that in the setting of a nursing home infections can spread fast, and some patients might be prone to critical or even fatal conditions after acquiring the infection.

Unfortunately, diagnosing infections in a clinical setting is in general a difficult problem, and can be extremely challenging. Confounding factors include the fact that other medical conditions can affect patients, and this makes diagnosis harder to perform than in otherwise healthy patients, since some symptoms may overlap. For such reasons, an automatic tool that can be used as a clinical decision support system (CDSS) able to detect infections at early stages would be of high interest and usefulness. An example of how an information management system is used to perform intelligent diagnosis in the medical domain can be found in the work of Liu, Qi, Xu, Gao, and Liu (2019).

The objective of this paper is to design, develop and validate such a CDSS for detecting infections in nursery home patients. In this work, three different groups of infectious diseases are considered: Urinary tract infections, acute respiratory infections and skin and soft tissue infections. We approach the problem of designing such a system by means of computational intelligence techniques, and more specifically machine learning. To do so, clinical data will be first acquired from residents at three nursing homes, located in Spain and Dominican Republic. The number of residents is small and as it often happens in clinical settings, data involving infections are very imbalanced. Also, clinical data (such as temperature, blood pressure or electrodermal activity) is scarce, since nurses are only taking measurements once per day. For this reason, we also incorporate to the model social and environmental data, expecting results to improve after considering this external information.

Given this approach of small data and the inclusion of environmental information which is readily available online, the CDSS designed in this work could be useful in real-world scenarios with limited material and human resources. Therefore, the deployment of such a system could be affordable in developing countries, since it only requires a nurse to take manual measurements of patients, requiring nothing else beyond standard medical equipment and a computer device. Conversely, other alternatives involving high-frequency monitors or more complex hardware might be unfeasible to deploy.

The remainder of this paper is structured as follows: Section 2 describes related work involving the diagnosis of infections, particularly works relying on physiological signals under small data scenarios and works involving social and environmental data. Then, Section 3 describes the stage of data acquisition, referring both to clinical data and social data, and how data was pre-processed in order to make it suitable for the diagnosis task. Later, Section 4 provides some insights about the data; i.e. descriptive facts about how clinical and social/environmental data are related, leaving Section 5 to describe the process for designing and developing the CDSS, including the end-to-end machine learning mechanisms involved. Finally, Section 6 presents and discusses the performance of the developed system, while Section 7 provides conclusive remarks and proposes some future lines of work.

Section snippets

Related work

Because of its clinical interest, the problem of infection diagnosis is a research field extensively covered in the medical literature, with many different approaches and results. However, from our experience surveying the state of the art, most of the works are based on medical procedures and approaches. Meanwhile, the number of works involving the application of machine learning is much more reduced, and in most cases they are focused on specific types of infections.

So far, there exist some

Data acquisition and processing

In this section we will describe the data acquisition stage, as well as how this data was processed in order to proceed to the diagnosis stage. In this work we are using two sources of data: clinical data, obtained from medical devices in nursing homes, and public internet data from different sources.

Data analysis

The main objective of the study is to develop a decision support system which can monitor physiological condition and social environment of a patient over time to detect an infection earlier. This requires designing a model that takes temporal information into account. However, we need first an understanding of the relation of physiological and social factors with the occurrence of infection in individuals.

To this end, a previous analysis of instant measurements data is performed in this

Diagnosis

In this work, infection diagnosis will be approached as a supervised learning problem. In particular, due to the lack of data, we have decided to tackle this problem as binary classification; i.e., we will be interested in the diagnosis of whether a patient suffers or not from an infection, without caring about its type. In the context of this project, this is a convenient trade-off, given that the system is expected to behave as a mere decision support system, to just alert clinicians that a

Results and discussion

To train and evaluate the CDSS we have followed a cross validation scheme using 3 folds. According to this approach, we will first randomly shuffle the data and then divide it into three subsets of equal size. Once the dataset is divided, the training and testing process is run three times, each time using two of the subsets for training and the remaining one for validation. Each of these repetitions outputs some performance metrics, which are then aggregated. By following cross-validation, the

Conclusions

In this paper, we have described a clinical decision support system (CDSS) for early infection diagnosis, focusing on three different types of infections: urinary tract infections, acute respiratory infections and soft tissues infections. This problem is clinically relevant, since early diagnosis leads to direct benefits on the patients (by timely administration of the corresponding drugs) and on the surroundings (by preventing fast spread of the infection).

These benefits are particularly

CRediT authorship contribution statement

Alejandro Baldominos: Data curation, Formal analysis, Methodology, Software, Validation, Writing - original draft. Hasan Ogul: Data curation, Formal analysis, Methodology, Writing - review & editing. Ricardo Colomo-Palacios: Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation, Writing - review & editing. José Sanz-Moreno: Investigation, Resources, Supervision, Validation. José Manuel Gómez-Pulido: Conceptualization, Funding

Acknowledgments

This work has been partially funded by ERANet-LAC Programme with grant number ELAC2015/T09-0819 (SPIDEP Project – Design and implementation of a low cost smart system for prediagnosis and telecare of infectious diseases in elderly people).

References (44)

  • S.P. Shashikumar et al.

    Early sepsis detection in critical care patients using multiscale blood pressure and heart rate dynamics

    Journal of Electrocardiology

    (2017)
  • G. Sun et al.

    An infectious disease/fever screening radar system which stratifies higher-risk patients within ten seconds using a neural network and the fuzzy grouping method

    Journal of Infection

    (2015)
  • L. Ward et al.

    Automatic learning of mortality in a CPN model of the systemic inflammatory response syndrome

    Mathematical Biosciences

    (2017)
  • A. Alessa et al.

    A review of influenza detection and prediction through social networking sites

    Theoretical Biology and Medical Modelling

    (2018)
  • A. Baldominos et al.

    Infection diagnosis using biomedical signals in small data scenarios

    Proceedings of the 2019 ieee 32nd international symposium on computer-based medical systems

    (2019)
  • S. Chae et al.

    Predicting infectious disease using deep learning and big data

    International Journal of Environmental Research and Public Health

    (2018)
  • M. Eisenstein

    Infection forecasts powered by big data

    Nature

    (2018)
  • D. Fajardo et al.

    Inferring contagion patterns in social contact networks with limited infection data

    Networks and Spatial Economics

    (2013)
  • E. Gultepe et al.

    A bayesian network for early diagnosis of sepsis patients: a basis for a clinical decision support system

    Ieee 2nd int. conf. comput. adv. biol. med. sci.

    (2012)
  • T. Hartvigsen et al.

    Early prediction of MRSA infections using electronic health records

    11th int. joint conf. biomed. eng. syst. technol.

    (2018)
  • Z. Hu et al.

    Automated detection of postoperative surgical site infections using supervised methods with electronic health record data

    Studies of Health Technology Informatics

    (2015)
  • F. Hutter et al.

    An evaluation of sequential model-based optimization for expensive blackbox functions

    Proceedings of the 15th annual conference companion on genetic and evolutionary computation

    (2013)
  • Cited by (0)

    View full text