Optical method supported by machine learning for urinary tract infection detection and urosepsis risk assessment

The study presents an optical method supported by machine learning for discriminating urinary tract infections from an infection capable of causing urosepsis. The method comprises spectra of spectroscopy measurement of artificial urine samples with bacteria from solid cultures of clinical E. coli strains. To provide a reliable classification of results assistance of 27 algorithms was tested. We proved that is possible to obtain up to 97% accuracy of the measurement method with the use of use of machine learning. The method was validated on urine samples from 241 patients. The advantages of the proposed solution are the simplicity of the sensor, mobility, versatility, and low cost of the test.


| INTRODUCTION
Sepsis is one of the causes of morbidity and mortality in hospitals. It is estimated that about a quarter of all sepsis cases is due to urinary tract infection (UTI) [1,2]. Prompt administration of the appropriate drug (antibiotic) targeting the correct causative agent (bacterium) increases the chance of patient survival.
The Society of Critical Care Medicine and the European Society of Intensive Care Medicine established a sepsis council and in 2016 a new definition of sepsis was published, calling the guidelines SEPSIS-3 [3]. According to the current definition, "Sepsis is a life-threatening organ dysfunction caused by an abnormal (deregulated) response of the body to infection (bacterial, fungal, viral)." When the SOFA [4,5] value increases by at least 2 points, the risk of mortality is estimated at about 10% with suspected infection. It is estimated that there are about 49 million cases of sepsis per year, with a mortality rate exceeding 23%, which is 20% of all deaths worldwide [6]. Due to its widespread prevalence, high mortality, and costs related to both treatment and subsequent consequences of sepsis, it is both a medical and economic problem on a global scale. Hence, in May 2017, the World Health Assembly (WHA), as the executive body of the WHO, adopted an important resolution on sepsis, intended to encourage member countries to make a joint effort to prevent, diagnose and treat it. Due to its widespread prevalence, high mortality, and costs related to both treatment and subsequent consequences of sepsis, it is both a medical and economic problem on a global scale. Hence, in May 2017, the WHA, as the executive body of the WHO, adopted an important resolution on sepsis, intended to encourage member countries to make a joint effort to prevent, diagnose and treat it.
Bacterial infections, especially those related to the urinary tract, are the most common cause of sepsis. Escherichia coli is most often responsible for urosepsis, accounting for as much as 52% of cases, followed by bacteria belonging to Proteus spp. Enterobacter spp., Klebsiella spp., and P. aeruginosa [7]. E. coli causes inflammation of the bladder, which can progress to acute phase pyelonephritis (15%-50% of cases), and 12% of these patients later develop bacteremia and sepsis. For UPEC infections, the possibility of frequent relapses and complications is observed. It is estimated that UTIs account for about 10%-20% of all communityacquired infections and about 40%-50% of nosocomial infections [8][9][10].
The presence of bacteria or their products in the urinary tract stimulates a rapid immune response, cytokines are produced and the neutrophil influx is observed, proinflammatory interleukins (IL-6, IL-8, TNF-α) and nitric oxide production are produced and nitric oxide is produced [11][12][13][14][15]. But it is the factors from the TLR family that recognize uropathogens that initiate an effective immune response. The epithelial cells of the bladder show a high expression of TLR 4, while the expression of TLR 4 in the epithelial cells of the kidney is low, which makes the response to uropathogenic bacteria much weaker [16,17] and the kidney-blood barrier may be overcome.
Sepsis diagnosis-in bedside practice, a shortened version of the parametric test SOFA, the so-called Quick-SOFA, allows one to quickly assess the condition of a patient with an infection. This approach does not require complicated equipment, but only a device for measuring blood pressure and observing the patient (altered state of consciousness, systolic arterial pressure < 100 mmHg, and respiratory rate per minute (≥ 22)). Identification of the etiological agent of sepsis and defining the cause is essential to institute appropriate treatment. Each hour of delay, for example, in the administration of antibiotics, reduces the survival rate among patients by an average of 7.6%/h [7]. In the case of bacteremia, the number of bacteria in the blood may be below 10 CFU/mL, hence such a low value significantly hinders diagnosis [18]. Therefore, it is crucial to use sensitive and specific as well as quick diagnostic methods to detect the causative agent of sepsis and to introduce appropriate therapy, which will increase the survival rate of patients with suspected sepsis.
The method of blood culture on appropriate culture media called "blood cultures," has been treated for many years as the "gold standard" in the diagnosis of sepsis. Blood cultures are performed for the detection of viable sepsis organism(s) and large hospital laboratories use automated continuous blood culture monitoring systems (BACTEC™ FX/9000, Becton Dickinson, NJ), colorimetry (BacT/ALERT, bioMérieux, France) or pressure variations associated with the release and consumption of gases (VersaTREK, TREK Diagnostic Systems, OH, USA). However, only 30% of sepsis patients can culture the bacteria/fungi responsible for sepsis [19]. This is because the number of microorganisms present in the circulation during bloodstream infection is usually low, ranging from 1 to 1 Â 10 4 CFU/mL. In addition, false-negative blood cultures may be the result of too small a volume of blood taken for testing [20]. In addition, the drug sensitivity assessment is most quickly available after 24-72 h. However, it should be noted that current sepsis treatment guidelines promote the initiation of antibiotic treatment within 1-3 h. Therefore, antibiotic therapy is empirical and is intentionally used as a broad-spectrum therapy to avoid overlooking the pathogen's sensitivity to the drug.
Currently, the commonly used method to differentiate generalized inflammatory reaction from sepsis is CRP protein determination combined with procalcitonin (PCT) in a patient blood [21]. For this purpose, in the laboratory immunofluorescent assay can be used and the immunochromatographic point-of-car test for the determination of PCT [22]. However, for the sepsis diagnosis PCT test has the highest accuracy with an approximate 80%-90% sensitivity for the identification of serious bacterial infections, the peak levels of PCT occur 24-48 h after sepsis [23][24][25].
To diagnose and implement appropriate treatment for urinary tract infections as quickly as possible, there is a need to provide a rapid and inexpensive means of diagnosing such infections, especially one that allows differentiation of uropathogenic E.coli strains (UPEC). In this paper, we present the method of discriminating urinary tract infections from urinary tract infections capable of causing urosepsis. The proposed method is based on absorption, allowing the determination of the changes in radiation intensity, similar to the solutions previously described by us [26,27]. The amount of absorbed radiation depends on the concentration [28] and the presence of secreted metabolites, and proteins by bacteria in each supernatant from the urine patient's sample. The proposed spectrometric technique with a classification algorithm makes it possible to perform precise measurements easily.

| MATERIALS AND METHODS
Measurements of the absorbance changes resulting in different metabolites/proteins produced during UTI or being produced by uroseptic E. coli strains were carried out. The method comprises the absorbance measurements in the spectra range from 600 to 750 nm. By analysis of registered signals, distinguishing urinary tract infections from urinary tract infections that can cause urosepsis is possible.
Clinical E. coli strains (241) were obtained from University Clinical Centre in Gda nsk from patient's urine/ blood samples. We have collected 86 UTI samples (positive bacterial culture from urine, negative from blood) and 155 urosepsis samples (positive bacterial culture from urine, positive from blood). The collection was done with accordance with bioethical committee agreement No. NKBBN/133/2019.
To investigate if the proposed optical method can discriminate urinary tract infections in biological fluids, we prepared several steps as shown in Figure 1. Mainly: (i)the first step, collecting urine samples from the patient storing the bacteria on cryopreservation beads, then inoculation of a liquid medium with use of cryopreserved microorganisms; (ii)-after achieving optical density (OD) of liquid culture that allowed for further analysis (OD600 > 0.2), (iii)-samples were separated into supernatant and pellet; afterwards, (iv)-absorbance measurements of supernatant were conducted; from obtained data, (v)-the validating set and training set were prepared to validate the model. During the preprocessing phase (vi), the noise was removed from the spectra and a specific range for features characterization was chosen. The two most important ones were selected: the area under the graph and the variance of the optical signal, (vii)-finally, classification of patients with UTI and urosepsis upon established criteria.

| Materials
The minimal medium was prepared according to the following recipe. To 1 L of ultrapure water, 6 g/L disodium hydrogen phosphate (Molecular Biology Grade, Merck, Germany), 3 g/L potassium dihydrogen phosphate (Molecular Biology Grade, Merck, Germany), 0.5 g/L sodium chloride (Molecular Biology Grade, Merck, Germany), and 1 g/L ammonium chloride (Molecular Biology Grade, Merck, Germany) were added and mixed until the components dissolved. The final pH of the solution was set at 7.4 at 25 C. The solution was then autoclaved at 121 C for 15 min, followed by the addition of 2 mL of 1 M magnesium sulfate (Molecular Biology Grade, Merck, Germany) solution and 20 mL of 20% glycerol solution (Molecular Biology Grade, Merck, Germany). Both solutions were previously subjected to steric filtration through a 0.2 μm pore diameter filter.
Sterile liquid medium (10 mL) was inoculated with use of rejuvenated bacteria from solid culture of clinical E. coli strains. The culture was then conducted for 24 h at 37 C by shaking at 70 rpm. After incubation, two sets of measurements were done: (i)-before and (ii)-after centrifugation to separate the supernatant from the bacteria pellet (4000 Â g, 20 min, 25 C). The supernatant was the sample subjected to absorbance measurements. The sample preparation scheme is shown in Figure 2.

| Normalization
Before the supernatant form-set (ii) was measured to discriminate UTI from urosepsis, and set (i) is measured for normalization parameter.
A 2 mL of liquid before centrifugation (i) was taken. The culture sample was placed in a measurement system. The absorbance value at 600 nm was then read, thus the normalizing parameter OD (optical density at 600 nm) was obtained. Studies have shown that call density in the sample is proportional to the OD. The OD is usually defined as the absorbance of the sample at 600 nm [29]: F I G U R E 1 Diagram of the proposed method of UTI Ddetection and urosepsis risk assessment.
where N is the number of microbial cells and Vis the volume.
OD parameter informs about the number of microbial cells in the sample. The needed number of cells depends on measurement system sensitivity. In our method, we estimated that if the OD parameter is greater than 0.2 it indicates that the biological material from the sample can be used for further preparation. Next, the sample is separated into pellets and supernatant to make a set for UTI and urosepsis discrimination. A sample with supernatant was placed in the measuring device. Measurements were performed on each sample, in the range of 190-840 nm with a wavelength increment equal to 1 nm. The obtained data were normalized using the OD parameter.

| Measurement system
In the experiment, we used a NanoDrop 2000 Spectrophotometer (Thermo Scientific, Wilmington, DE). The wavelength range of the spectrophotometer is from 190 to 840 nm, with wavelength accuracy equal to ±1 nm and bandwidth equal to 1.8 nm. The absorbance measurement accuracy is 3% (at 0.74 Abs at 350 nm).
The device is made of a xenon flashlight as a light source, a monochromator, and silicon CCD photodiodes as detectors. The light emitted by the source splits between the reference channel and the sample channel. In the sample channel, the light passes through the sample and propagates to a monochromator, and then reaches the detector. In the reference channel, the light propagates directly to the monochromator and then is detected by a detector. The obtained data are processed and displayed on the computer.
The absorbance of the supernatant was measured for the entire wavelength range of 190-840 nm, but we decided to use only the region of 600-750 nm for analysis. Using a narrower range for the proposed method for UTI discrimination analyses will make the method more versatile and allow the use of the different detector.

| Classification
To understand the different metrics used to evaluate classifiers, it is important to understand the following terms: 1. True Positive (TP): the model correctly identifies a positive sample (e.g., a sample labeled as "UROSEP-SIS" is classified as "UROSEPSIS"). 2. True Negative (TN): the model correctly identifies a negative sample (e.g., a sample labeled as "UTI" is classified as "UTI").

False Positive (FP): the model incorrectly identifies a
negative sample as positive (e.g., a sample labeled as "UTI" is classified as "UROSEPSIS").

False Negative (FN): the model incorrectly identifies a
positive sample as negative (e.g., a sample labeled as "UROSEPSIS" is classified as "UTI").
Accuracy is a basic and simple measure of the quality of a model. It is calculated by dividing the number of correctly classified samples by the total number of samples. Mathematically, it can be expressed as: Balanced accuracy is a measure of the model's ability to correctly classify both positive and negative samples.
Receiver operating characteristic area under the curve (ROC AUC) is a metric that measures the ability of a classifier to distinguish between positive and negative classes. The ROC curve can be plotted by using Equations (4) and (5): F1 Score is a metric that measures the ability of a model to recognize true positives. It is calculated by taking the harmonic mean of precision and recall: F I G U R E 2 Diagram of sample preparation, (i) sample for checking the number of microbial cells in the sample (OD600 measurement), (ii) sample for absorbance measurement for UTI discrimination.
Precision is a measure of the proportion of positive samples that are correctly classified and is calculated as the number of true positives divided by the sum of the true positives and FPs Recall, on the other hand, is a measure of the model's ability to detect positive samples, and is calculated as the number of true positives divided by the sum of the true positives and FNs, as shown in the following equation: A confusion matrix is a graphical representation of the performance of a classifier, showing a summary of the TP, TN, FP, and FN predictions made by the model, as shown in Figure 3.

| RESULTS
To discriminate UTI and urosepsis infections, absorption measurements were carried out. For this purpose, urine samples were collected from 241 patients with confirmed E.coli-induced UTI (86 isolates) and patients with confirmed E. coli-induced urosepsis (155 isolates). Samples were then prepared according to the procedure described in Section 2.1 using the collected urine. The absorbance measurements were performed for each of the obtained samples as shown in Figure 4.
Measurements containing septic material are characterized by lower values of absorbance than measurements from UTI, but changes are minor therefore further analysis is needed. Of the lack of clear correlations, it was decided to use artificial intelligence techniques, specifically machine learning. The main goal was to create relatively fast-learning models that would nevertheless provide a high level of prediction accuracy. For this reason, the models that were used are categorized as weak learners, The use of the Lazy Predict library made it possible to test as many as 27 such algorithms. This allows a preliminary estimate of which groups of algorithms achieve the best results. To ensure proper verification of the learned models, the dataset was split into two parts, two-thirds being used for training and the remaining one-third being used for testing to validate the model. During the preprocessing phase, the noise was removed from the spectra using a low-pass filter and the used spectral range was from 600 to 750 nm. Relevant features were then selected through feature engineering. From the 59 features determined based on the strength of the optical signal, the two most important ones were selected: The area under the graph and the variance of the optical signal being the most crucial. The best algorithms have been meta-optimized to improve performance. The accuracy results of each of the 27 algorithms in two cases with and without normalization are shown in Figure 5. All the presented results were obtained using the validation dataset.
The following quality metrics were determined for each of the tested algorithms: ROC AUC, F1, confusion matrix, and time taken. The results obtained are presented in Table S1 (for raw data) and Table S2 (for normalized data). The best accuracy for raw data was obtained by the Extra Trees model at 97%, while for normalized data, the highest accuracy was obtained by the K-neighbors algorithm at 95%.
The data obtained and the analysis using classification allowed for the selection of optical culture parameters allowing for virtually zero-one identification of individual strains. As can be seen, the best classification was obtained with use of KNeighborsClassifier classifier reaching around 95% of accuracy. This is a very high result taking into consideration that each clinical strain was different cell culture of a distinct clinical strain with different OD values. We have prepared the cultures in three different batches to ensure the quality of data. The measurements of absorbance were done ungrouped (the samples of UTI and uroseptic strains were mixed, and the proper blanks were also introduced between sample measurements), even though as can be presented in Tables S1, S2 we have achieved very good classification results.
The approach used also allowed for the verification of many classification algorithms, which, based on the collected data, allowed us to distinguish the samples into two groups with high accuracy. Such a result proves that the optical parameters of the cell culture and, consequently, the metabolites, and proteins secreted into the F I G U R E 3 Example of the confusion matrix. medium by individual strains are unique. As indicated in the literature [30][31][32], many differences can be observed within the metabolites responsible for the processes of energy production in urosepsis bacteria, as well as the proteins responsible for healing iron-siderophores. Importantly, our approach of cultivating bacteria on a minimal medium allowed us to verify the optical properties of bacterial cultures, eliminating the risk of F I G U R E 4 Absorbance measurement of supernatant samples from liquid culture (confirmed E.coliinduced UTI and E. coli-induced urosepsis), pure water was used as a reference, and the liquid medium was used as blank. contamination of patients' urine by antibiotics, drugs, or other substances that come, for example, from the food chain.

| CONCLUSION
The proposed method can be used in medical laboratories, hospitals, and veterinary facilities. The optical technique can be an alternative to the current method of measuring procalcitonin in the patient's blood [33] and blood and urine culture which takes roughly 24 h. Measurements in the proposed solution take a few seconds, its additional advantages are the simplicity and mobility of the system, versatility, and low cost of the test. The method was validated on samples from 241 patients. Archived classification accuracy depends on the chosen algorithm and achieves an accuracy of up to 97% for raw data, eliminating the need for additional measurements to normalize the dataset. The sample preparation is the most time-consuming part because it is necessary to grow bacteria in the solution. The time needed for bacteria growth depends on the sensitivity of measuring equipment. Different spectrometers can easily replace a measurement system, in future studies worth to focus on reducing time needed for bacteria incubation by trying more sensitive detectors.
The versatility of the method and measurement equipment used will allow for analysis of other changes in urine associated with changes in the obtained spectrum. The proposed method focuses on measurements in the 60-750 nm range. The equipment used makes it possible to measure a wider spectrum. Appearance of absorption change at other wavelengths can be associated with other bacteria. This means that the method can be developed to perform classification over another or even wider spectral range using the proposed algorithms. It may enable detection of multiple changes in urine simultaneously. To detect changes associated with the appearance of other bacteria, a new predictive model is necessary.
Future studies will attempt to increase the number of measurements when more samples become available. Additionally, the parameters of the algorithms are selected manually, to improve the efficiency of the machine learning algorithms, it is planned to use classical genetic and quantum-inspired algorithms for automatic meta-optimisation. That helps to improve the method.