Antibody Profiling of Kawasaki Disease Using Escherichia coli Proteome Microarrays

non - fever control (NC), fever control (FC), KD before intravenous immunoglobulin treatment (KD1), KD at least 3 weeks after treatment (KD3). This study is the first to profile plasma antibodies in KD and demonstrate that an E. coli proteome microarray can screen differences among patients with KD, non - fever controls, and fever controls.


Introduction
Kawasaki disease (KD) is a form of acute systemic vasculitis syndrome with an unknown etiology that most commonly occurs in children under the age of 5 years old.
KD is characterized by prolonged fever, bilateral conjunctivitis, diffuse mucosal inflammation/strawberry tongue, polymorphous skin rashes, indurative edema of the hands and feet associated with peeling of the finger tips, and non-suppurative neck lymphadenopathy (1). The most serious complication of KD is the formation of coronary artery lesions. A sequela of the vasculitis with coronary artery aneurysm has developed in 20-25% of untreated children, but a single high-dose of intravenous immunoglobulin (IVIG) can lower the incidence of aneurysm to 3-5% (2).
Physicians diagnose KD according to fever days and five main clinical symptoms, but this method may not be objective enough. Although some lab tests such as C-reactive protein and erythrocyte sedimentation rate may help to confirm the inflammation (3), they are not specific markers for KD. Till now, there is no biochemistry diagnosis kit available for KD. It has been reported that KD patients have antibody differences in their blood, such as anti-endothelial cell antibodies and anti-neutrophil antibodies (4)(5)(6). However, other groups do not find these antibody differences (7). This controversial result indicates the need of high throughput techniques for screening antibody markers. 6 More than 85% of KD occurs in children under the age of 5 years. In infant or early childhood, gut microbiota plays an important role in immune homeostasis and autoimmunity (8)(9)(10). Since E. coli is a type of common commensal intestinal bacteria in humans, it is revealed to be one of the first foreign substances that a newborn encounters. Therefore, E. coli logically contributes to humoral immune responses in infants and may be helpful with regard to identifying KD disease markers. We use high-throughput techniques to express and purify ~4200 E. coli proteins and develop E. coli proteome microarrays (11)(12)(13)(14). These E. coli proteome microarrays have been successfully profiling antibodies in the blood stream and identifying antibody biomarkers in inflammatory bowel disease (12) and bipolar disorder (14). Since most KD symptoms have been correlated to immune dysfunction, we hypothesize that antibodies in the plasma behave differently in KD patients. Furthermore, the antibodies may be altered in KD patients after treatment and recovery. To investigate our hypothesis, we measured the antibody amount in the plasma and profiled the antibodies using thousands of E. coli antigens. In this study, we found different antibody profiles in KD patients and we confirmed some antibodies with high accuracy in the single-blind test.

Patients and controls
In this study, the KD subjects (n = 60) consisted of children who met the criteria for KD (15) and were treated with IVIG at Kaohsiung Chang Gung Hospital. The patients were initially treated with a single dose of IVIG (2 g/kg) during a 12-hour period as in our previous reports. Blood samples were obtained prior to IVIG treatment (pre-IVIG, KD1), within 3 days following IVIG treatment (post-IVIG, KD2) as acute stage samples, and at least 3 weeks following IVIG (KD3). We excluded patients whose symptoms did not match the KD criteria, had an acute fever for less than 5 days, or had an incomplete collection of pre-and post-IVIG blood samples.
Age-matched febrile patients (n = 60) and non-fever children (n = 60) from outpatient clinic were included as controls. Febrile control patients consisted of children admitted to the ward with an upper and/or lower respiratory tract infection. Blood samples were collected in 10 mL heparin tubes (BD, catalog No. 367874), kept in room temperature, and harvested plasma by centrifuge at 3000 rpm for 10 minutes within 24 hours. Plasma samples were aliquoted into several copies and stored immediately in -80 C refrigerator until detection. The Institutional Review Board of Kaohsiung Chang Gung Memorial Hospital (IRB No. 102-1015A3) approved this 8 study. Informed consent was obtained from all individuals' parents or guardians prior to carrying out the study.

Measuring plasma IgG
We determined plasma IgG concentrations using the anti-human IgG ELISA kit (eBioscience) in accordance with the manufacturer's instructions. In short, 96-well plates were coated with a capture antibody, washed, blocked, and incubated with 750,000 fold plasma diluents. After being washed several times, we applied the detection antibody and incubated it for another hour with shaking. After another several washes, we sequentially added tetramethylbenzidine, stopped the solution, and acquired data using the ELISA reader with O.D. 450 nm (BioTek, synergy 2).

Creating E. coli proteome microarrays
We modified the methods for high-throughput protein expression, purification, and printing from our previous study (11). In brief, His-tag proteins were overexpressed in an E. coli ASKA collection and purified with resin beads in a high-throughput manner. To create the E. coli proteome microarrays, ~4200 purified proteins were identified on each aldehyde slide in duplicate using SmartArrayer 136 (CapitalBio). Protein microarrays were immobilized at 4 °C for 8 h and then stored at 9 -80 °C for future use.

Antibody profiling using E. coli proteome chips
To use, the E. coli proteome chips were thawed in TBST and blocked with 3% BSA for 1 hour. Plasmas were diluted to 1.25 mg/dl IgG in 1% BSA of TBST as predetermined by the ELISA kit. The blocked proteome chips were incubated with plasma dilutant for 1 hour and shaken at 50 rpm. After washing 3 x 10 min with TBST, the chips were incubated with an anti-human antibody cocktail, which consisted of Dylight550 labeled anti-human IgM antibody (Bethyl Laboratories) and Dylight650 labeled anti-human IgG antibody (Bethyl Laboratories), for 1 hour. Each chip was washed 3 x 10 min with TBST and scanned using a chip scanner (CapitalBio, LuxScan TM ).

Data processing and analysis
Signal intensity for each spot is extracted using GenePix Pro 6.0 software. We averaged the duplicates in each chip and then normalized all data by mean scaling. In order to maintain data quality, we applied principal component analysis to remove poorly performing chips. We further used supervised machine learning (16) to select a small set of markers; our selection process included significance analysis of microarrays, support vector machine, and 10-fold cross-validation. K-means clustering (17) was used to demonstrate the differentiation power of the selected markers. Finally, we calculated accuracy using the sum of true positive and true negative divided by the sum of positive and negative.

Creating focus array and plasma assays
The selected 73 proteins were purified and printed on the aldehyde slides (CapitalBio) using the same protocol that was used for the E. coli proteome microarray. The format was designed to fit a 14-well cassette ( Figure 3). Each protein was printed in duplicate at 4 °C and then immobilized for 8 hours. Focus array chips were kept at -80 °C until they were needed to carry out the experiments.
Additional plasma of NC, KD1, KD3, and FC (n = 20) were probed using focus arrays. Focus arrays were blocked using 3% BSA for 1 hour. After blocking, the chips were washed with TBST for 10 min and assembled into 2 x 7 well cassettes (Arrayit).
The plasmas were diluted into 1:800 with 1% BSA of TBST, and 100 μl was placed into each well. The chips were incubated for 1 hour with shaking, disassembled, and washed for another 3 x 10 min with TBST. The detection antibodies, washing steps, and scanning protocols were as same as described in the previous section.

Training and single-blind testing
To prevent potential bias, different personnel performed the plasma coding, plasma assay, and data analysis. Another set of plasmas were randomly coded with X01-X80. Signal intensities were extracted by GenePix Pro 6.0 in the foreground minus the background format. Once the duplicates were averaged, each block was normalized by mean scaling, and then the negative values were replaced with 1. With regard to the training section, all the plasmas were labeled with NC, KD1, KD3, or FC (n=20). We applied a logistical regression model (LRM) to systematically combine different biomarkers, including IgG and IgM responsive proteins with different signal scales in the training group, and then evaluated the performance of these protein groups by examining the area under the curve (AUC). We conducted LRM analysis using the "glm" of the package "stat" and calculated the AUC score with the "performance" function from the package "ROCR" in the R environment (R version 3.0.3). In single-blind testing, we coded all the plasmas with random numbers and compared the results with the information provided by the pediatric doctors. The AUC score cutoff for the training group was controlled by 0.9, and the protein groups would be formed as a unique factor. We further examined the performance of the testing group, for which 20 additional patients were chosen. Evaluation of the performance of both the training and testing groups was performed by comparing 12 accuracy, specificity, sensitivity, and AUC scores. 13

Physiological parameters of KD and control subjects
This study consisted of 20 KD1, 20 KD3, 20 FC, and 20 NC in order to determine plasma IgG levels and further investigate using E. coli proteome chips.  Table 1).

Antibody profiling of KD and control subjects using E. coli proteome chips
Before carrying out the antibody profiling, we determined the IgG concentrations in the plasmas. Surprisingly, we found some disease-related differences in the IgG concentrations (Supplementary Figure 1). Reduced IgG levels were only observed in KD1 but not NC or FC. These findings indicated that plasma IgG plays a crucial role in KD, thus giving us a strong reason to profile the antibodies.
We used high-density E. coli proteome chips, which consisted of ~4200 proteins,

Selecting markers for KD and control subjects from antibody profiling
To identify markers for healthy individuals and KD sufferers, we normalized signals from each chip with mean scaling and sent them for supervised machine learning ( Figure 1C), which included three steps: selecting significant hits, supervising learning, and performing cross-validation. We selected a small number of markers based on these strategies, which are shown in the heat map ( Figure 2). In Comparing the KD1 group to the NC group, we found 6 IgG markers with sensitivity 68%, specificity 94% and accuracy 81%, and 11 IgM markers with sensitivity 77%, 15 specificity 94% and accuracy 84% (Figure 2A). In comparing the KD1 group to the FC group, we found 12 IgG markers with sensitivity 79%, specificity 90% and accuracy 84%, and 13 IgM markers with sensitivity 74%, specificity 90% and accuracy 82% ( Figure 2B). In comparing the KD3 group to the NC group, we found 11 IgG markers with sensitivity 84%, specificity 100% and accuracy 92%, and 9 IgM markers with sensitivity 78%, specificity 100% and accuracy 89% ( Figure 2C). In comparing the KD1 group to the KD3 group, we found 16 IgG markers with sensitivity 84%, specificity 84% and accuracy 84%, and 11 IgM markers with sensitivity 94%, specificity 63% and accuracy 78% ( Figure 2D). In general, the markers indicating KD disease and those indicating KD recovery both demonstrated remarkable accuracy.

Designing and creating focus arrays
Of the ~4200 different E. coli proteins, we successfully identified ~70 promising diagnosis and recovery markers. To further support their clinical value, we created focus arrays containing all of the IgG and IgM markers. We printed each marker in duplicate in each 75 mm x 25 mm slide and formed 14 identical blocks (Figure 3). To imitate clinical use, we did not adjust the dilution factor based on the IgG concentration, but instead set the dilution factor to 800-fold, which provided the best signal. This assay only took 125 pl of plasma which is less than a drop of blood. We used an additional n = 20 non-coded plasmas in the training section and n = 20 coded plasmas for the single-blind testing ( Figure 1C).

Training and single-blind testing
The receiver operating characteristic (ROC) curve can provide comprehensive analysis for summarizing the accuracies of predictions (18,19); therefore, we applied the ROC curve to compare the different tests of the protein biomarkers discovered from the supervised machine learning results. Furthermore, we performed logistic regression model (LRM) analysis to combine these protein biomarkers to determine whether these combinations would increase overall performance. Since each point on the ROC curve represents the true positive rate (sensitivity) and false positive rate (1specificity) of a particular cut-off value, we can consider the area under the curve (AUC) as an overall performance indicator, while an AUC score close to 1 indicates an excellent diagnostic test, and an AUC value close to 0.5 indicates a poor diagnostic test. We first researched the individual performance of selected proteins in the training set of KD1 vs. NC and found that AUC scores for the selected proteins ranged from 0.68 to 0.77 (the colored solid lines showed in the left of Figure 3B, Supplementary Table 3). After performing LRM analysis on these protein biomarkers, we created a 17 combinational protein group using 17 IgM-responsive proteins, which increased the AUC score for the protein group to 0.91 (the black solid line in the left of Figure 3B).
The protein group also showed good performance on the single-blind test using an additional testing set of KD1 vs. NC, in which the AUC score reached 0.84 (the red solid line shown in the right of Figure 3B). Furthermore, most of the proteins in this group showed significant immunogenic responses in the KD1 group compared with the NC group in the single-blind test ( Figure 3C, 14 of 17 proteins showed significant differential bindings between KD1 patients and NC people with a p-value smaller than 0.05). We also investigated conservative motifs for these IgM-responsive Figure 2). These proteins are also involved in nucleotide binding and are extrinsic to membrane and pentose-glucuronate metabolism. Such results suggest that these IgM-responsive proteins may contribute to Kawasaki disease and that the consensus motif may provide predictive values for further testing.

proteins and found a unique sequence named [EI]IDALVEK[QA][IL]L[EN][ERD]LI (Supplementary
Since children with Kawasaki disease have fevers, we also included children with fever symptoms but that were not diagnosed with Kawasaki disease as our other group of negative controls. We selected 20 proteins using the training set; the AUC scores ranged from 0.56 to 0.63 (the colored solid lines in the left of Figure 4A Table 3). We further examined predictive power using this protein group in the testing set, and the AUC score was only 0.75 for this single-blind test (the red solid line in the right of Figure 4A). However, the boxplot demonstrated that none of the proteins significantly differentiated KD1 patients from FC controls ( Figure 4B Table   3). In the single-blind test, this protein group continues to have predictive power, and the AUC score is 0.99 for the testing set (the red solid line in the right of Figure 5A).
All nine of these proteins significant immunogenic responses against IgG antibodies in KD3 patients compared with the NC group ( Figure 5B, p-values are less than 0.001).
Regarding the KD1 group vs the KD3 group, we discovered some proteins that showed excellent prognostic values in transitioning from KD1 to KD3. Six proteins, were selected using the training set of KD1 vs KD3 groups, demonstrated very good performance on the ROC curve, the AUC scores ranged from 0.93 to 0.95, and the score for this protein group was 0.97 using LRM analysis (the black solid line showed the group result in the left of Figure 6A, Supplementary Table 3). In the single-blind test, the combination of this protein group also showed very good performance, resulting in an AUC score of 0.98 when using the additional testing set (the red solid line in the right of Figure 6A). Like in the KD3 vs NC groups, these six proteins were also IgG-responsive proteins and demonstrated significant immunogenic responses in KD3 patients when compared to the KD1 patients ( Figure 6B, all p-values are less than 0.001).

Discussion
Kawasaki disease is a form of acute febrile vasculitis that most commonly affects children under the age of 5 years and can lead to multiple organ injuries, including of the kidneys and heart. One common feature of KD patients is pyuria, which affects 30-80% (20). Sterile pyuria most commonly occurs in KD patients ≤ 1 year of age.
Although not always sterile in KD patients, pyuria can be the result of a urinary tract infection. E. coli and Klebsiella oxytoca have previously been reported as causative pathogens for Kawasaki disease (20,21). The clinical phenotypes do not differ between patients with and those without urinary tract infection. These evidence indicates that E. coli may be a relevant pathogen of KD.
In this study, we analyzed antibody profiles of KD1 and KD3 serum and compared them with FC and NC. We found that KD3 antibody profiles are very different from those of NC and KD1, indicating that a huge change occurs with Kawasaki development or IVIG treatment. In general, the half-life of IVIG in blood is 21 days (22), which may partially explain the antibody profile difference. Another possible reason is adaptive immunity, which would produce IgG, in which case the peak is 21 days (23). Both IVIG and adaptive immunity would contribute to most, if not all, of the dramatic changes in the antibody profiles. Interestingly, all six classifying markers for KD3 vs KD1 are also good classifiers for KD3 vs NC, which 21 suggests that the KD3-developed antibodies against these E.coli proteins are so unique that neither KD1 nor NC subjects have them, thus indicating that adaptive immunity may take part in developing some unique antibodies from KD1 to KD3.
Of all the identified classifiers for KD1 vs. NC, KD1 vs. FC, KD3 vs. NC, and KD3 vs. KD1, we found that the znuC protein is the most common protein, which indicates that znuC is clearly an antigen from KD1 to KD3. Interestingly, class differences appear in the anti-znuC antibodies, e.g., anti-znuC IgM in KD1 and IgG in KD3. ZnuC is an ATP-binding protein and a part of the ABC transporter for zinc. It is located in the bacterial surface and is vital for the survival of stress-induced bacteria (24). On one hand, znuC is conserved in many bacterial species, such as E.coli, Klebsiella oxytoca, Shigella flexneri, Salmonella enterica, and Pasteurella multocida.
Both E. coli and Klebsiella oxytoca infections have been reported in KD (20,21). On the other hand, zinc is important for cardiovascular health in humans (25).
Cardiovascular inflammation is the most severe symptom of KD patients (2). Therefore, znuC is an ideal target for KD patients to develop a specific antibody, thus making it a perfect marker for KD diagnosis.
Our blind test results revealed that our identified classifying protein markers could effectively classify KD1 vs NC, KD1 vs FC, KD3 vs NC, and KD3 vs KD1 with an AUC value of at least 0.75 (Supplementary Table 2). Of those, KD3 patients 22 could be easily distinguished from KD1 and NC. However, KD1 cannot be well separated from FC, which may be due to the complexity of the fever controls. Further studies are needed to separate FC subtypes and compared to KD1. This study combines proteome microarray screening, bioinformatics analysis, and validation to identify reliable markers for KD diagnosis. We only required a drop of blood for our specimens. These platforms and markers may improve the understanding of the diagnosis and prognosis of KD.

Conflict of Interest Statement
All of the authors declare that they have no financial relationships and no conflicts of interest to disclose with regard to this article.     intensities of KD1 vs FC markers in the single-blind test. Data are presented as mean± SEM and were analyzed by student t-test, * p < 0.05, ** p < 0.01, and *** p < 0.001.