Discovery of Urinary Proteomic Signature for Differential Diagnosis of Acute Appendicitis

Acute appendicitis is one of the most common acute abdomens, but the confident preoperative diagnosis is still a challenge. In order to profile noninvasive urinary biomarkers that could discriminate acute appendicitis from other acute abdomens, we carried out mass spectrometric experiments on urine samples from patients with different acute abdomens and evaluated diagnostic potential of urinary proteins with various machine-learning models. Firstly, outlier protein pools of acute appendicitis and controls were constructed using the discovery dataset (32 acute appendicitis and 41 control acute abdomens) against a reference set of 495 normal urine samples. Ten outlier proteins were then selected by feature selection algorithm and were applied in construction of machine-learning models using naïve Bayes, support vector machine, and random forest algorithms. The models were assessed in the discovery dataset by leave-one-out cross validation and were verified in the validation dataset (16 acute appendicitis and 45 control acute abdomens). Among the three models, random forest model achieved the best performance: the accuracy was 84.9% in the leave-one-out cross validation of discovery dataset and 83.6% (sensitivity: 81.2%, specificity: 84.4%) in the validation dataset. In conclusion, we developed a 10-protein diagnostic panel by the random forest model that was able to distinguish acute appendicitis from confusable acute abdomens with high specificity, which indicated the clinical application potential of noninvasive urinary markers in disease diagnosis.


Introduction
Acute appendicitis is one of the most common surgical emergencies in clinic worldwide [1]. However, the precise diagnosis of acute appendicitis is still a challenge, especially in children and childbearing women. The absence of classical clinical signs presents in 30~45% of patients [2]. Also, similar symptoms of other acute abdominal inflammations (such as gastrointestinal disorders, cholelithiasis, and gynecological diseases) with appendicitis lead to misdiagnosis and high negative appendectomy rates, which is reported to a range between 10% and 32% [3][4][5]. In addition, the missed diagnostic rate of acute appendicitis ranges from 11% to 28% [6,7], and the delays in diagnosis and treatment would lead to the increased risk of appendix rupture, abscess, and even septicemia [8].
For now, the diagnosis of acute appendicitis mainly depends on patient symptoms and serological results, such as white blood cell count (WBC), C-reactive protein (CRP), and neutrophil cells. However, these tests did not present efficient sensitivity and specificity for appendicitis diagnosis [9,10]. Imaging examinations such as ultrasonography and computed tomography (CT) are also used to assist diagnosis. Ultrasonography has moderate specificity (81%, 95% CI: 78-84%) [11] in detecting acute appendicitis, limiting its diagnostic ability. Though CT examination is more accurate [11], the strong radiation risk and high cost make it unsuitable for wide application. Additionally, several risk scores by combing clinical signs have been developed to improve the predictive ability for appendicitis. The Alvarado score is one of the most well-known scoring systems; it has good sensitivity (99%, 95% CI: 97-99%) but low specificity (43%, 95% CI: 36-51%) [12]. Thus, efficient diagnostic methods for discrimination of appendicitis from other confused acute abdomens are in urgent need.
In this study, a high throughput urinary proteome analysis platform was employed to get deep profiling of urine samples from patients with acute abdomens. The urinary proteomic data were analyzed using several classification strategies for diagnosis of acute appendicitis with other mimic acute abdomen samples. Among these models, a random forest diagnosis model with a panel of 10 urine proteins achieved the best diagnostic result. Without using clinical signs and imaging examinations, the random forest model got an accuracy of 83.6% in the validation dataset, and the sensitivity and specificity were 81.2% and 84.4%, respectively. Hence, our study provides potential urinary markers and efficient model for acute appendicitis diagnosis.

Materials and Methods
2.1. Patient Samples. The study was approved by the Tianjin Baodi Hospital Ethical Review Committee. Urine samples from 134 patients with acute abdomens were collected between 2013 and 2016 from Tianjin Baodi Hospital, including 48 cases with acute appendicitis (AA) and 86 cases with control diseases (cholecystitis and cholelithiasis (CHO), pancreatitis (PAN), gastrointestinal perforation (GP), intestinal obstruction (IO), and other acute abdomens (OTH)). Patients were diagnosed according to blood tests, radiological studies (CT and/or ultrasound), or pathological results after surgery. Patient information was shown in Table 1, and the sample metadata were summarized in Table S1.

Urine Sample Preparation.
Midstream urine was obtained from all donors before treatment and stored at -80°C. Urine samples were prepared according to a previously described method ( Figure S1) [26]. Briefly, 8.9 ml of urine was centrifuged at 200,000g for 70 min. The pellet was resuspended and heated at 56°C for 30 min to remove uromodulin. After heating, the solution volume was adjusted to 8.9 ml and centrifuged at 200,000g for 70 min to get the final pellet. Then urea buffer (8 M urea, 50 mM Tris, 75 mM NaCl, pH 8.2) was added to resolve the pellet. About 10 μg of protein from each sample was separated by SDS-PAGE in 10% gel. Each gel lane was cut into 3 bands. Each band was cut into small gel pieces, distained with distaining buffer (50% acetonitrile, 50% 50 mM NH 4 HCO 3 ), and then washed with 75% acetonitrile and pure water. Gel pieces were digested with trypsin at 37°C overnight.  All assigned peptides were filtered with 1% false discovery rate (FDR). Only proteins with ≥2 strict peptides (1% FDR and ion score > 20) were kept for quantification. Intensity-based absolute quantification (iBAQ) algorithm was used for protein quantification. To normalize the differences in loading amounts among samples, the iBAQ value was converted to iFOT (fraction of total, iBAQ value of each protein divided by the sum of all iBAQ values of all proteins in the sample). For better visualization, iFOTs were multiplied by 10 5 [27]. To eliminate the difference between different groups, the iFOTs were normalized by quantile algorithm [28].

Feature Selection and Construction of Machine-Learning
Models. In the discovery dataset, normalized iFOTs were log2 transformed, and the missing values were imputed with a minimal value. For the selection of feature, firstly, AA and CON outliers were screened with the normal reference intervals (RIs) built with 495 normal urine samples [26]. The criterion of outlier is that the iFOT value of a protein is more than 95% CI of normal RI plus 2 folds of interquartile range (IQR). Outlier proteins in each sample were identified and then were further screened depending on the iFOT cutoff value and frequency in the disease group to build diseaserelated outlier pool. Outlier pool selection standards: (1) the iFOT of the outlier is more than 75% percentile of all the outlier iFOT cutoff values, and it should be outlier in at least 90% percentile of all the outlier frequencies in the AA group or the CON group; (2) if the protein is not expressed in healthy urinary samples, it should be outlier in at least 95% percentile of all the outlier frequencies in the AA group or the CON group. AA outlier pool and CON outlier pool were established, respectively. Then, the nonparametric Wilcoxon test was performed between proteins in the two outlier pools; proteins with p value < 0.05 were applied in feature selection algorithm for further screening. Ten proteins were selected and applied in the construction of classification models with different machine-learning algorithms, including random forest (RF), support vector machine (SVM), and naïve Bayes (NB). Leave-one-out cross validation was used in the evaluation of model robustness. To evaluate the prediction effect, the classification models were verified in the independent validation dataset.

Statistical Analysis. R software (version 3.5.3) and
RStudio (version 0.99.489) were used in the construction outlier models and machine-learning models, statistical analysis, receiver operating characteristic (ROC) analysis, and plotting. Gene ontology (GO) analysis was performed with DAVID (https://david.ncifcrf.gov/).

Proteomic Profiling of Acute Appendicitis (AA) and
Control Acute Abdomen (CON) Samples. A total of 134 samples passed quality control in protein identification (more than 800 proteins in each sample) were used for data analysis, including 48 AA samples and 86 CON samples (Table 1). High instrument quantification reproducibility was achieved with an average Pearson correlation coefficient of 0.88 between quality control samples. Totally, 5335 proteins were identified and quantified (Table S2), and an average of 1561 ± 432 proteins was quantified in each sample. Among them, 3834 proteins were identified in both groups. There were 138 unique proteins in the AA group and 1363 unique proteins in the CON group (Figure 1(a)). There were 198 (5.0%) proteins identified in all AA samples, 183 (3.5%) in the CON group, and 142 proteins identified in all the samples (Figure 1(b)). The distribution of proteins showed that the CON group had much more low frequency (<10% frequency) proteins (43.7%) than the AA group (31.4%) ( Figure S2). What is more, the highly expressed proteins with more than 90% frequency (AA: 449 proteins; CON: 542 proteins) had a greater proportion than the other groups with more than 20% frequency, suggesting that there were a considerable percentage of "core" urinary proteins widely expressed in different urine samples.

Reliability Demonstration of Profiling Data in Acute
Pancreatitis. To assure whether the urinary proteomic method could be applied in the diagnostic biomarker discovery for nonurogenital disease, we checked the classification ability of our data for acute pancreatitis (PAN) from other acute abdomens because there were commonly accepted reliable urinary markers for pancreatitis. The diagnostic capabilities of the urinary level of amylase (the expression products 3 BioMed Research International of AMY2A and AMY2B) and trypsinogen-2 (the expression product of PRSS2) had been proved in many researches [29,30]. We found that AMY2A, AMY2B, and PRSS2 were significantly highly expressed in urine samples from pancreatitis patients compared with those from other acute abdomen patients (Figures 2(a)-2(c), Kruskal-Wallis test, p < 0:05). The AUCs of the three proteins between pancreatitis and the rest of the acute abdomen samples were all above 0.75 (Figure 2(d), Table 2). The AUC of AMY2A was 0.83, which was the best one among these three proteins, showing its outstanding ability in diagnosing pancreatitis. This result demonstrated that the urinary proteome system could produce reliable datasets, and proteins derived from nonurogenital tissues could be detected and beneficial in disease diagnosis.

Feature Selection for Classification of AA and CON.
Based on the type of mass spectrometers (MS), the dataset was divided into discovery dataset and independent validation dataset (Table S2). In the discovery dataset, a total of 73 urine samples including acute appendicitis (AA, n = 32) and control acute abdomen (CON, n = 41) specimens that processed with multiple MS types were used for feature selection and classification model construction. The independent validation dataset consisted of 16 AA and 45 CON specimens, which were all produced by the same MS type (Orbitrap-QE-Plus).
In order to select the features for classification (Figure 3(a)), AA outliers and CON outliers were identified, respectively, in the discovery dataset against a normal urine database including 495 samples. Outliers were proteins with expression that is higher than 95% confidence interval (CI) of reference interval (RI) plus 2 folds of interquartile range (IQR) of RI. The frequently detected outliers could be considered as markers to reveal changes under pathological conditions. With further screening by frequency (shown in Methods), 287 proteins were selected as AA outliers, and 322 were CON outliers (Table S2).  For the above two outlier pools, Gene ontology (GO) analysis was conducted (Figures 3(b), 3(c), and S3). Among the top 5 processes, three items including platelet degranulation, acute-phase response, and cellular oxidant detoxification were enriched both in AA outliers and CON outliers (Figures 3(b) and 3(c)). For proteins enriched in acutephase response (ranked second and third in AA outliers and CON outliers, respectively), eight proteins were shared by AA and CON; one protein (ITIH4, inter-alpha-trypsin inhibitor heavy chain family member 4) was CON-specific. Hence, in order to distinguish different diseases, specific proteins in outliers should be further filtered as classification features. The 151 significantly differential proteins between the two groups (nonparametric Wilcoxon test, p value < 0.05, Table S2) were used for feature selection algorithm. Ten proteins (LYVE1: lymphatic vessel endothelial hyaluronan receptor 1; AHCYL1: adenosylhomocysteinase-like 1; APOC1: apolipoprotein C1; SECTM1: secreted and transmembrane 1; SLC31A1: solute carrier family 31 member 1; ITGA6: integrin subunit alpha 6; SLC35F2: solute carrier family 35 member F2; GPX3: glutathione peroxidase 3; TMEM14C: transmembrane protein 14C; SLC47A2: solute carrier family 47 member 2) were further selected for the establishment of classification models, and their expressions were shown in Figure S4.

Construction and Evaluation of Classification Models.
However, AUCs of the ten single-feature-based classification models were all less than 0.75 (Figure 4(a)), indicating the limitation of single feature in classification. Aiming at model optimization, machine-learning algorithms were further adopted to integrate multiple features for model construction, including random forest (RF), support vector machine (SVM), and naïve Bayes (NB) (Figure 4(b)). The robustness of these 3 models was evaluated by leave-one-out cross validation, and the accuracies of the 3 models were 84.9%, 83.6%, and 79.4%, respectively. In the independent validation dataset, 16 AA and 45 CON specimens were included. The RF model could correctly distinguish 83.6% of samples (sensitivity: 81.2%, specificity: 84.4%), SVM model had an accuracy of 78.7% (sensitivity: 25.0%, specificity: 97.8%), respectively. The RF model achieved the best performance among the 3 models and showed advantage in differential classification.

Discussion
Acute appendicitis is one of the most significant acute abdomen diseases in clinic, but the diagnosis of acute appendicitis    BioMed Research International is still difficult. In our study, a urinary proteomic system was applied to discover urinary markers for differential diagnosis of AA with other acute abdomens. A panel of 10 proteins integrated by the RF model achieves 84.9% of accuracy in leave-one-out cross validation and 83.6% in the independent validation dataset, which was better than the reported results of WBC (sensitivity 62%, specificity 75%) and CRP (sensitivity 57%, specificity 87%) [10]. Notably, without relying on any clinical sign, this RF model using urine markers showed high specificity in AA diagnosis, which was helpful in reducing negative appendectomy. Among the 10 features in this diagnostic model, two features had AUCs higher than 0.7, including LYVE1 (lymphatic vessel endothelial hyaluronan receptor 1) and AHCYL1 (adenosylhomocysteinase-like 1). LYVE1 is an immunity-related factor that may play a role in the migration of immune cells, and the loss of LYVE1 is a sign of activation of inflammation [31,32]. The expression of LYVE1 is lower in the AA group than in the CON group. Since the appendix is also an immune organ, this may indicate that inflammation is more easily activated in AA. AHCYL1 is upregulated in the CON group, maybe because it acts as a regulator of fluid secretion [33], which is important for the secretary organs in the CON group. APOC1 (apolipoprotein C1) is a feature that upregulated in the AA group. APOC1 can bind to lipopolysaccharide (LPS) on the outer-membrane of gram-negative bacteria to present LPS to macrophages, thereby stimulating the inflammatory response [34]. The upregulation of APOC1 might reflect the existence of bacterial infection in AA than the diseases in the CON group, such as cholecystitis and pancreatitis.
In this study, we found that the urinary protein panel and the random forest model were powerful in the diagnosis of acute appendicitis with the other acute abdomens. The advantages of random forest algorithm, such as nonoverfitting, robustness to noise, make it a robust tool in classification. Our results also demonstrated the potential of urinary protein in the diagnosis of nonurogenital diseases, as the change of acute pancreatitis biomarkers could be well detected in different disease groups. With the relatively high specificity, our result might provide beneficial supplement to the clinical auxiliary diagnosis of acute appendicitis. However, there are still areas that need further improvement. Firstly, this urinary diagnostic model should be further validated in a larger number of samples. Secondly, for facilitation in potential clinical application, the feature proteins need to be absolutely quantified. Except for the immunological methods, the MS-based method for quantification is becoming popular, with the increasingly high-throughput detection of target proteins independent on antibodies. Lastly, since urine is rich of small metabolites, the combination of urinary protein makers with metabolites might contribute to improving diagnostic efficiency.

Conclusions
In summary, a urinary proteomic system was applied to discover urinary markers for differential diagnosis of AA with other acute abdomens. A panel of 10 proteins integrated by the RF model achieves 84.9% of accuracy in leave-one-out cross validation and 83.6% in the independent validation dataset. Without relying on any clinical sign, this RF model using urine markers showed high specificity in AA diagnosis, which was helpful in reducing negative appendectomy, indicating the potential of noninvasive urinary markers in clinical application.

Data Availability
The data used to support the findings of this study are available from the corresponding authors upon request.

Conflicts of Interest
All the authors declare no conflict of interest.