A new method for identifying the acute respiratory distress syndrome disease based on noninvasive physiological parameters

Early diagnosis and prevention play a crucial role in the treatment of patients with ARDS. The definition of ARDS requires an arterial blood gas to define the ratio of partial pressure of arterial oxygen to fraction of inspired oxygen (PaO2/FiO2 ratio). However, many patients with ARDS do not have a blood gas measured, which may result in under-diagnosis of the condition. Using data from MIMIC-III Database, we propose an algorithm based on patient non-invasive physiological parameters to estimate P/F levels to aid in the diagnosis of ARDS disease. The machine learning algorithm was combined with the filter feature selection method to study the correlation of various noninvasive parameters from patients to identify the ARDS disease. Cross-validation techniques are used to verify the performance of algorithms for different feature subsets. XGBoost using the optimal feature subset had the best performance of ARDS identification with the sensitivity of 84.03%, the specificity of 87.75% and the AUC of 0.9128. For the four machine learning algorithms, reducing a certain number of features, AUC can still above 0.8. Compared to Rice Linear Model, this method has the advantages of high reliability and continually monitoring the development of patients with ARDS.


Introduction
Acute respiratory distress syndrome is a disease that seriously threatens the health of human lives [1,2]. According to relevant epidemiological investigations, the in-hospital mortality rate of ARDS is as high as 40% [3,4]. Currently, the diagnosis of ARDS disease is mainly based on the Berlin definition [5]. The Berlin definition was introduced in 2012 and allowed a clear diagnosis of ARDS disease by stating that when positive end-expiratory pressure (PEEP) �5 cmH2O, ARDS can be classified into three stated with increasing severity, namely, mild (200 < arterial oxygen partial pressure (PaO 2 )/ fraction of inspired oxygen (FiO 2 ) (P/F) �300), moderate (100 < P/F � 200), and severe (P/F � 100), according to the level of oxygenation index (P/F). At present, blood gas analysis is mainly used to measure PaO 2 to calculate a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 the P/F value to evaluate the severity of ARDS. However, this method is still limited by some defects [6]. Firstly, the calculation of the P/F value requires blood gas analyses. In the clinical use of arterial indwelling catheters, daily care is difficult, and it is not easy to operate on some particular patients, such as newborns and elderly patients [7]. Secondly, arterial blood gas analyses cannot monitor the development of patients with ARDS in real-time, which makes doctors unable to adopt appropriate respiratory therapy strategies and delay the diagnosis and treatment of patients [8].
In recent years, in response to encountered problems in conducting blood gas analyses, researchers attempted to use the noninvasive parameter pulse oximetric saturation (SpO 2 )/ FiO 2 (S/F) to estimate P/F, thereby achieving noninvasive identification of the severity of ARDS disease [9][10][11]. At this stage, the single SpO 2 parameter was mainly used, and there was a specific limit expected in relation to the range of SpO 2 (SpO 2 � 97%). The traditional linear regression algorithm [11] was used to construct the prediction model, but the model identification effect was not ideal [10,[12][13][14][15][16]. Additionally, it was challenging to provide accurate guidance for medical staff in the clinic [12,17].
Based on a review of the literature [9], we found that when a patient's condition changes, the patient's physiological parameters (such as heart rate, blood pressure, respiratory rate, etc.) will change at varying degrees, which provided ideas that assisted the investigation of the aims of our study.
In response to the problems listed above, we extracted a variety of noninvasive physiological parameters from ICU patients and explored the relevance of these parameters for the identification of the level of P/F ratio. An algorithmic model for identifying ARDS disease based on a variety of noninvasive parameters was established to provide medical staff with the reference basis for disease diagnosis. This model uses a feature selection algorithm and a cross-validation model to evaluate the recognition effects of four machine learning algorithms using different subsets of feature values.
Herein, we used a variety of evaluation indicators to assess the ability of different algorithms and feature subsets for ARDS disease identification. To further investigate the performance of machine learning algorithms, we used existing data to classify the ARDS disease using traditional linear regression models, and we discuss the various methods of development.

Data sources
Medical Information Mart for Intensive Care III (MIMIC-III, V1.4) is a large, freely available database comprising de-identified health-related data associated with over forty thousand patients who stayed in the critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012 [18]. The database includes information, such as demographics, vital sign measurements obtained at the bedside, laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality records.

Patients and data collection
The patient diagnostic information was recorded in the MIMIC-III database. In the patient screening process, we combined the diagnostic information provided by the database and the Berlin definition to determine whether the enrolled patient was suffering from ARDS, thus ensuring the accuracy of the disease diagnosis. In combination with the Berlin definition and the disease diagnosis, we propose the following conditions: 1) determine whether the patient has a P/F < 300 on the first day of entering the ICU, 2) determine whether the patient underwent chest imaging during his/her presence in the ICU and whether the imaging report was verified, 3) formulate a comprehensive judgment based on the patient's disease diagnosis information.
Combined with the above, we propose the corresponding patient selection criteria: This study extracted a variety of noninvasive physiological parameters of patients: demographics (age, gender, height, weight, body mass index (BMI), ethnicity), ICU information (ICU type, length of stay in ICU, admission type, in-hospital mortality), clinical measures (SpO 2 , temperature, heart rate, blood pressure, Glasgow Coma Scale (GCS)), respiratory system (respiratory rate, tidal volume, minute ventilation volume, peak pressure, plateau pressure, mean air pressure, PEEP, FiO 2 ), and oxygenation index (P/F, S/F, Oxygenation Index (OI), Oxygenation Saturation Index (OSI)).
This study has paid more attention to the noninvasive physiological parameters of patients. Based on an extensive review of the literature combined with the actual recording parameters of patients in the database, the following noninvasive physiological parameters are finally used in the identification algorithm: SpO 2 , temperature, heart rate, blood pressure, GCS, respiratory rate, tidal volume, minute ventilation volume, peak pressure, plateau pressure, mean air pressure, PEEP, FiO 2 , S/F, OSI, and demographics (age, gender, BMI). Additionally, convert these parameters into 24 features for model training. The main purpose of this study was to identify ARDS by monitoring P/F values through a variety of noninvasive parameters. We used P/F as the outcome variable, P/F � 300 data points as positive samples, and P/F > 300 as negative samples.
In the process of extracting the physiological parameters for patients from the database, we also needed to extract the blood gas analyses outcomes obtained at a specific test time to ensure the accuracy of the identified results. However, this also caused considerable data losses. To avoid this problem, we allowed the use of data from the first two h following blood gas analyses in the data collection process as a substitute for the respective outcomes at the specific, desired test time.

Methods
This section provides an overview of the procedures described in the adopted methods, which are visually summarized in Fig 1. The dataset for this study was from the MIMIC-III database. After preprocessing the data, they were divided into a training (75%) and a test set (25%). In the model training process, we used the training dataset, used cross-validation to evaluate the identification performance of different feature subsets and algorithms, used the test set to verify the model, and compared it with the traditional algorithm.

Handling missing values.
In the process of collection of physiological parameters from patients, we have found that some physiological parameters were recorded at a lower frequency, such as the noninvasive blood pressure recordings, thus resulting in the absence of physiological data recordings at the time of the blood gas analyses. Fortunately, the patient's invasive blood pressure was continually monitored, which could provide data when noninvasive blood pressure data was lacking [19]. Therefore, the random forest was used to complement the patient's noninvasive blood pressure missing values, and other physiological parameters were imputed using k-nearest neighbor (k-NN) [20].

Oversampling and normalization.
In the data preprocessing process, we found that use of P/F � 300 to divide the dataset into positive and negative samples would result in an imbalance in the dataset. The use of unbalanced data for the machine learning algorithm training would result in a bias toward the larger sample size, which would make the generalization ability of the algorithm insufficient, and would affect the overall performance of the model. For the above reasons, we used the oversampling method to deal with the problem of data imbalance [21]. The current implementation methods of oversampling included random oversampling, the Synthetic Minority Oversampling Technique (SMOTE) [22], and the Adaptive Synthetic (ADASYN) sampling approach [23]. Random oversampling solves the problem of data imbalance by randomly sampling in the classes which are under-represented. SMOTE uses the similarities of under-represented samples in the feature space to generate new samples. The ADASYN solution was used to generate different numbers of new samples for different under-represented samples, based on the data distribution. It is an extension of SMOTE, but from the results, ADASYN tended to focus on some outliers. Based on the above analyses, we use SMOTE to deal with the problem of the data sample imbalance.
This study used a variety of physiological patient parameters each of which was associated with a different range of values. For most machine learning algorithms (such as neural networks), this situation would result in the slow learning of the algorithm, and would be easier to Summarize the overall process of the experiment. The raw data in the MIMIC-III database was preprocessed, and the data set was awakened and randomly grouped: 75% data was used for model training; the remaining 25% data was used for model testing, and comparative experiments were conducted to obtain the final experimental results.
https://doi.org/10.1371/journal.pone.0226962.g001 achieve the local optimal solution, thereby affecting the training outcomes of the algorithm. Therefore, it was necessary to normalize the feature values of different orders of magnitude to the same order of magnitude [a, b]. We used feature scaling to standardize data in accordance with Eq (1): Where X min represents the minimum and X max represents the maximum value of an attribute. The motivation to use feature scaling was based on the robustness to very small standard deviations of features and the preservation of zero entries in the sparse data.

Feature selection
This study extracted a variety of information and physiological parameters of patients, and these parameters correspond to 24 features, but it is unclear which features yield a strong correlation with the identification of ARDS disease. Conversely, the performance of the supervised learning algorithm had a certain correlation with the number of input features, and the correlation between features and outcome variables. The purpose of feature selection was to identify a subset of features that optimized the algorithmic performance compared to the original feature set. There were three types of feature selection algorithms, namely, filter, wrapper, and embedded [24]. The filter method first selected the feature of the dataset and then trained the classifier. The feature selection process was independent of the subsequent classifier. In contrast to the filter method, the wrapper feature selection directly applied the performance of the classifier to be used as the evaluation criterion of the feature subset. In the case where more features existed, the computational overhead was usually much larger than that for the filter. The embedded method combined the evaluation feature importance with the model algorithm, and resulting in an increased correlation between the feature value selection results and the evaluation algorithm. In most cases, other algorithms were not applicable [25].
The filter method was associated with a small number of calculations, and the feature value selection result did not depend on the classification algorithm [24]. The filter method generally evaluates the importance of feature values in three ways: distance, dependency, and information. Based on this, we have selected three representative methods for these three aspects: relief, chi-squared, and mutual information.

Relief-F.
The key idea of the Relief-F algorithm was to estimate the quality of attributes according to how well their values distinguished among instances that were close to each other [26]. For example, the quality estimation of the attribute j is shown in Eq (2), if sample x j i , belonged to class k, Relief-F first searched for the x j i;nh (near-hit) of x j i in the sample of class k, and then found x j i;nm (near-miss) of x in each class other than the kth class.

Chi-squared.
The chi-squared was based on the χ 2 statistic and consisted of two phases. The first phase began with a high significance level for all numeric attributes for discretization. Each attribute was sorted according to its values [27]. The following steps were then performed: 1) calculate the χ 2 value in accordance with Eq (3) for every pair of adjacent intervals, and 2) merge the pair of adjacent intervals with the lowest χ 2 value. Eq (3) was used for computing the χ 2 value in accordance to Where A ij denotes the number of patterns in the ith interval and jth class, and E ij is the expected frequency of A ij .

Mutual information.
Mutual information based feature selected(MIFS) was used to measure the amount of information shared between the two features [28]. The mutual information I(X,Y) between two variables X and Y was expressed as

Rank aggregation.
To ensure the stability of feature selection, we used a combination of filter feature selection methods [25]. The results of the three algorithms were not in a uniform numerical range, and it is not convenient to evaluate the importance of the features. In order to make the three methods equally important, we normalized the three results [29]. The rank aggregation method was formulated based on Eq (5).
Where F i is the result of different filter feature selection methods, and R j is the final rank score for the jth feature.

Classification algorithms
This study designed an algorithm that combined feature selection with multiple classification algorithms, used a 10-fold cross-validation model, trained classifiers for different feature subsets, and selected the optimal combination of feature subsets and classifiers, and achieved the identification of the ARDS. This section presents an abridged description of the four classifiers selected for this study.

L2 regularized logistic regression (L2-LR).
In order to prevent overfitting of the classification algorithm, a regularization term was added to the traditional logistic regression cost function J(w,b). Since the feature selection used the external filter method, this study used L2 regularization to avoid the situation where the L 1 regularization caused the weight to be sparse [30]. Furthermore, λ is the regularization parameter used to control the weight w.
Jðw; bÞ ¼ min Whereŷ i is defined in accordance to Eq (6), m is the number of samples, and n is the number of features.

Artificial neural network.
This study used a single hidden layer feedforward neural network (SLP-FNN). According to the number of features and the outcome variables, the following network structure was designed. Specifically, the number of neurons in the input layer was 24, the number of neurons in the hidden layer was 23, and the number of neurons in the output layer was two. In order to quickly iterate and train the network, we used a stochastic gradient descent algorithm to optimize the parameters of the network, while we concurrently used adaptive learning rates. Selecting the rectified linear unit function as the activation function we could effectively prevent the occurrence of gradient disappearance. To prevent overfitting in the network training, we used the L2 regularization term. The principle is the same as that described in subsection 3.3.1.

AdaBoost.
The AdaBoost algorithm is a two-class learning method in which the model is an additive model, the loss function is an exponential function, and the learning algorithm is a forward step-by-step algorithm. The specific idea of AdaBoost was to increase the weights of samples that had been misclassified by the previous round of weak classifiers, and to reduce the weights of those samples that were correctly classified [31]. As a result, the data that were not correctly classified were more concerned by the latter round of weak classifiers owing to their increased weight. Herein, G m (x) is a weak classifier, and α m indicates the importance of G m (x) in the final classifier, Eq (8) is a mathematical description of the forward distributed algorithm, and Eq (9) is the final classifier constructed based on Eq (8).

XGBoost.
XGBoost is a scalable machine learning system for tree boosting. The impact of the system has been extensively recognized in a number of machine learning and data mining challenges [32].
Herein,L ðtÞ is a differentiable convex loss function that measures the difference between the predictionŷ i and the target y i . The second term O(f t ) penalizes the complexity of the model. The additional regularization term helps to smooth the final learned weights to avoid overfitting. Moreover, γ and λ are the regularization parameters used to control regularization terms.

Traditional noninvasive classification method.
Previous studies on the use of the noninvasive parameter identification ARDS focused on the use of a single parameter S/F to fit the P/F value. This study used the linear regression model proposed by Rice et al [11]. The model used adult SpO 2 values (SpO 2 < 97%) to fit the P/F values, thus enabling continuous monitoring of the patient's P/F values using noninvasive parameters. The Rice Linear Model is shown in Eq (12).
The noninvasive parameter S/F is used to obtain the predicted P/F value according to Eq (12) so as to classify the severity of ARDS disease, and to obtain the classification result of the traditional algorithm.

Performance metrics
According to the diagnostic definition of ARDS disease, P/F � 300 is ARDS. According to this standard, the sample is divided into positive and negative results. Table 1 describes the relationship between the real category and the identification category.
We measured the classification performance based on the average of AUC, and the accuracy (ACC), sensitivity (SEN), specificity (SPE), and balanced error rate (BER), as defined by Eqs (13)-(16), respectively.
BER is a balanced metric that equally weights errors in SEN and SPN. We used the BER index to select the optimal feature subset based on a 10-fold cross-validation model [33]. For each algorithm, under the different feature subsets, the smallest mean BER was chosen as the optimal feature subset (the minimum BER subset) of this algorithm [29]. The search algorithm for optimal feature subsets is summarized in Algorithm 1 (Fig 2). According to the results of this algorithm, the minimum feature subset of the algorithm was found within the BER standard deviation of the optimal feature subset. At the same time, the two cases presented above were compared with all of their features to select the optimal identification result.

Results
We identified 8702 patients who met our inclusion criteria from a total of 46476 patients enrolled in the MIMIC-III database. Fig 3 is a flowchart outlining the patient selection and detailing the number of patients and the data selection process. There were 6601 patients (148414 data points) in the training set and 2101 patients (47352 data points) in the test set.
The demographics and utilization characteristics are summarized in Tables 2 and 3. Table 2 summarizes the demographic information of patients. The training set has a consistent patient distribution with the test set. In the training set, the patients were hospitalized in different intensive care units: CSRU (2231, 33.8%), MICU (1851, 28.4%), SICU (927, 14.04%), TSICU (904,13.09%), and CCU (688, 10.42%), the average age of patients was 65.14. The majority of the patients were male (58.64%). The patient in-hospital mortality rate was 16.34%. Table 3 summarizes the distribution of physiological parameters of patients classified in the training and test sets. As observed, there is a large difference between the positive samples (P/F � 300) and the negative samples (P/F > 300) within the same dataset. For the training set and the test set, the two datasets were randomly grouped and had a common distribution. There is no significant difference in the dataset.  Table 4 represents the normalized values of the scores provided by the three filter methods under consideration. The importance of the features in this study is relative to the level of oxygenation index. The closer the value is to one of the scores, the more relevant the feature is.

Feature selection result
The MIFS criterion showed that a number of parameters were relevant, while the Relief-F and According to the ARDS diagnostic criteria, the appropriate enrolled population was selected from more than 40,000 patients in the MIMIC-III database, and 8702 eligible patients were finally included, and the data sets were randomly divided into training sets and test sets. chi-squared tests were more conservative, and indicated that the SpO 2 and S/F were the more important and relevant features. The ranking of the final features is listed in Table 3. This combined score was calculated based on Eq (5). According to the combined score, SpO 2 is clearly more relevant than the rest of the parameters. Furthermore, SpO 2 , S/F, FiO 2 , and PEEP, are also likely to be highly relevant features.

Algorithmic evaluation
Using the training dataset, the 10-fold cross-validation methods were used to evaluate the performance of the four algorithms. According to the feature ranking results in Table 4, the features were substituted into the four algorithms in turn, the BER of each algorithm was used to select the feature subset, and the algorithm effect was compared based on AUC. As shown in Fig 4, the BER of the four algorithms change as a function of the number of features, and the average BER results of the four classification algorithms are listed for different feature subsets. The gray area represents the standard deviation of the BER. The red triangle and green dot marks and their corresponding numbers represent the minimum feature and the optimal feature subsets, respectively. We found that the BER of the four algorithms decreased considerably when the first five features were added to the model, but as the features were added gradually, the BER decreased slowly. For SLP-FNN, L2-LR, and AdaBoost, almost all feature training models were used to achieve minimum BER. Compared to the first three algorithms, XGBoost achieved the minimum BER in the 13th feature. As the number of features increased, the BER appeared to increase. We selected the smallest number of features for which the mean BER was within one standard error of the minimum BER (subset selection threshold). According to this standard, we found the optimal and smallest feature subset of the four algorithms: L2-LR (24,18), SLP-FNN (24,20), AdaBoost (23,21), XGBoost (12, 6).

Performance of classification algorithms 4.3.1 Training dataset.
Based on the selected features, we obtained the minimum, optimal feature subset for the training set. We used training data for the minimum, optimal, and all feature subsets. Four classification algorithms were trained using 10-fold cross-validation. The results are shown in Table 5.
By comparing the results of the optimal and minimum feature subsets, we found that the minimum feature subset was determined using the minimum BER and standard deviation, but with the use of fewer feature quantities (reducing certain data information). However, there was no significant decline in AUC. Fig 5 shows the AUC results for each algorithm with the use of different feature subsets.

Test dataset
The test set was completely independent of the data of feature selection and model training. When the training set was used to test the performance of the algorithm, we added a traditional noninvasive identification algorithm to compare the traditional and the machine learning algorithm. The final result is shown in Table 6. The ROC curves of the five algorithms are shown in Fig 6. Based on the results, we show that the overall performance of the traditional algorithm exhibits a specific gap with respect to the machine learning algorithm. The AUC (0.7354) of the Rice Linear Model is much lower than the AUC of L2-LR under the minimum feature subset (0.8156).
In this study, we analyze the classification ability of features, use filter methods to sort the importance of features, and use MIN-BER-FS algorithm to find the optimal feature subset and the minimum feature subset. In the algorithmic evaluation, we compare the experimental results of the minimum, optimal, and all feature subsets, which can reflect the ability of the algorithm to mine information from different aspects. In the process of feature traversal, it can also be said that the feature importance is consistent with the feature sorting experiment results. From the results, XGBoost has the best results under the optimal subset. At the same time, XGBoost achieved better results than the other three algorithms in the minimum feature subset (only the first six features). Observing the variation of the standard deviation during the training process, we can see that AdaBoost has the best stability.

Discussion
In this study, a novel identification algorithm was presented that combined multiple noninvasive physiological parameters with machine learning algorithms to estimate P/F ratio levels. First, we used the MIMIC-III database to extract the SpO 2 , PaO 2 , and FiO 2 that were commonly used to identify ARDS. At the same time, we extracted a variety of other noninvasive physiological parameters relevant to the patient. In terms of feature selection, the filter method was selected, and its feature selection was independent of the subsequent models. We used a variety of feature selection algorithms (Relief, chi-squared, MIFS) to filter the features, combined with the rank aggregation method, to obtain the final feature ranking results [25]. In the process of designing the ARDS identification algorithm, we used the cross-validation model to evaluate the average BER of the four algorithms (L2-LR, SLP-FNN, AdaBoost, XGBoost) in the optimal feature subset and the minimum feature subset according to the results of feature The X-axis is the feature number, the y-axis is the BER average of the ten-fold crossvalidation, and the gray shaded area is the BER standard deviation of the ten-fold cross-validation under a specific feature subset. The figure shows the trend of BER changes of the four algorithms in the process of adding features step by step. The position of the green circle is the optimal feature subset of the algorithm, and the red triangle is the smallest feature subset. https://doi.org/10.1371/journal.pone.0226962.g004 sorting to comprehensively consider the number of features and the identification results, thereby allowing the choice of the most suitable combination [29,33,34]. Conversely, selection of the minimum number of features implied the elimination of features that were insensitive to the accuracy of identification, simplification of the difficulty of the identification algorithm in actual use, and saving computation time. Conversely, the accuracy of the identification algorithm cannot be sacrificed.
Regarding the noninvasive identification related research on the severity of ARDS, most of the current research is concerned on the relationship between S/F and P/F [9,10,12,35]. S/F and P/F did exhibit strong correlations, but the use of S/F for regression analyses alone led to a large error in the classification of the severity of ARDS disease [6], and there was a range of restrictions on SpO 2 (SpO 2 < 97%) [10,11]. Some researchers have found that P/F was affected by some other parameters, such as the possible connection to a ventilator, or the modification  [9]. At the same time, when the P/F changed, some physiological parameters of the patient (such as heart rate, respiratory rate, etc.) also changed [7]. Based on the above analyses, this study considered a variety of noninvasive physiological parameters obtained from patients. In the feature selection method design, we did not use an algorithm alone, but chose a variety of algorithms and integrated the results to prevent the selection of a single sorting algorithm to make the feature sorting less accurate. We selected three representative methods in accordance with distance, dependency, and information, and normalized the three feature ranking results to calculate the final feature ranking outcome. Compared to a single method, the sorting result is more stable and reliable. Table 5 shows the results of the training set. L2-LR achieved a minimum BER when all the features were used, thus yielding an AUC = 0.8268. The neural network also reached the minimum BER when all the features were used, and its recognition performance was slightly better than L2-LR, thus yielding an AUC = 0.8464. Both AdaBoost and XGBoost were lifting tree algorithms. The identification results of the lifting tree algorithm were better than the logistic regression and neural networks (single hidden layer). When AdaBoost used 23 features, the BER outcome was the smallest, and yielded AUC = 0.8694. When XGBoost used 12 features, the BER outcome was minimal and yielded an AUC = 0.9282. Using the average minimum BER to find the minimum feature subset, it was found that reducing the number of features to a certain extent did not affect the recognition performance of the algorithm. In this respect, the advantage of XGBoost is obvious. With the use of six features, the accuracy rate only dropped by 0.42%. Combined with Fig 4, we found that the first six features (SpO 2 , S/F, FiO 2 , PEEP, mean air pressure, respiratory rate) contributed considerably to the identification algorithm, and the BER decrease was more distinct. After the addition of the features, the BER decreased gradually.
In the test set, we introduced a traditional linear regression algorithm [11] to evaluate the recognition performance of the classification algorithm. The performance of the four algorithms on the test set was basically consistent with the results of the training set, and yielded good generalization ability for the single-center independent dataset. The Rice Linear Model yielded an AUC = 0.7738 and ACC = 70.67%, which are far from the corresponding results elicited based on the machine learning algorithm. According to the literature published by Rice in 1994, we know that the Rice Linear Model method was based on the premise of SpO 2 < 97% [11]. This study directly used the model for the existing data (and the range of SpO 2 without limitations), and some deviation was expected. In actual clinical problems, there are often some ARDS patients with SpO 2 > 97%, but their P/F � 300. In this case, Rice Linear Model cannot provide doctors with auxiliary diagnosis decisions, and our algorithm model can overcome the shortcomings of traditional methods. The application scenario for this algorithm is as shown in S1 Fig. The patient's oxygenation level (�300 or >300) was identified by collecting mechanical ventilation parameters and physical signs to assist the physician in the diagnosis of ARDS without blood gas analysis. For patients who have been diagnosed as ARDS, the algorithm is used to monitor the patient's oxygenation index level in real-time, and the doctor can adjust the ventilator treatment plan at any time. On the other hand, the MIN-BER-FS algorithm can significantly reduce the amount of computation, and it is easy to transplant the algorithm to the ordinary microprocessors of ventilators and monitors, which can realize a more intelligent aided diagnosis.
There were also some limitations associated with this study. The MIMIC-III database used in this study was a single-center database. Even though in the experimental design we separated the training set from the test set, we still need to perform external verification to ensure that the model works well in different hospitals. We found no data on bilateral pulmonary infiltrate of non-cardiogenic origin in the database, and there were very few patients with a clear diagnosis of ARDS. In the process of patient screening, this study based on the actual definition of the Berlin definition and database, as far as possible to select patients who meet the diagnostic criteria of ARDS, this process may introduce some confounding factors. The population of the clinical database we used was mainly concentrated at the age of 55, and most of the patients were middle-aged and elderly patients, which may bias our training model more focused on elderly patients, and there may be deviations in low-age populations.
Dataset imbalance is a ubiquitous problem in clinical research, and we have adopted a compromise of oversampling dataset imbalances. Oversampling does not solve the problem fundamentally, but can only alleviate the deviation of the results caused by data imbalances to some extent. The problem of data set imbalance can be fundamentally solved only by expanding the dataset. Missing data is an important problem in all modeling efforts, especially in the healthcare domain. For the MIMIC database, missing data problems also exist, such as the patient's noninvasive blood pressure, airway pressure, and other physiological parameters. If these missing data is omitted, a lot of samples will be lost. If some technical means are used for missing data processing, most of the samples will be retained. However, the above two methods will cause deviations in the model, the former was caused by the partial patient data loss, and the latter was caused by the introduction of some errors in the process of missing value estimation based on interpolation [19].
This study was exploratory and was mainly applied to investigate whether the use of noninvasive parameters could identify the ARDS patients, and to use feature selection techniques to select which noninvasive parameters yielded a higher correlation to the oxygenation level. In the future, we will include more patients with ARDS and develop multi-classification methods to achieve continuous ARDS disease severity identification. The outcomes of this study are expected to provide some ideas for future related research.
Next, in the future, an early warning system of the severity of ARDS for the monitors and ventilators will be developed using a multi-classification algorithm The patient's oxygenation level (�300 or >300) was identified by collecting mechanical ventilation parameters and physical signs to assist the physician in the diagnosis of ARDS without blood gas analysis. For patients who have been diagnosed as ARDS, the algorithm is used to monitor the patient's oxygenation index level in real-time, and the doctor can adjust the ventilator treatment plan at any time.

Conclusion
In conclusion, the overall classification effects of machine learning algorithms were better than those elicited by traditional algorithms. For machine learning algorithms, XGBoost was significantly better than the other three algorithms. Feature sorting and feature selection algorithms can help us understand the characteristics of ARDS to identify which features elicit better correlations, and can improve us design high-precision algorithms. The method can continually provide medical assistants with auxiliary diagnosis suggestions.