Integration Of IgA and IgG Autoantigens Improves Performance Of Biomarker Panels For Early Diagnosis Of Lung Cancer

validated ELISA-format autoimmune of LC and 25 patients, individuals, and 29 Integration autoimmune profiles and validate a biomarker panel of three IgA autoantigens (i.e., BCL7A, and TRIM33 and MTERF4) and three IgG autoantigens (i.e., CTAG1A, DDX4 and MAGEC2) for diagnosis of early stage LC with 73.5% sensitivity at >85% specificity. In Phase III, the performance of this biomarker panel with an independent cohort, comprised of 88 early stage LC patients, 18 LBL patients, and 36 healthy subjects. Finally, a blind test on to confirm the performance of the biomarker panel. In summary, study the first time an integrated panel


Introduction
Lung cancer (LC) remains the leading cause of mortality from malignant tumors worldwide (1,2). According to the World Health Organization (WHO), among the 8.8 million cancer-related deaths in 2015, LC caused 1.69 million deaths worldwide (3). In the most populated country, China, LC alone is responsible for the mortality of 42.05 per 100,000 persons (4). Recent studies have provided mounting evidence that cancer patients can exert humoral immune responses to autologous cellular antigens, dubbed as tumor associated antigens (TAAs), in a wide range of cancer types, including colorectal cancer, gastric cancer, lung cancer, and ovarian cancer (5)(6)(7)(8)(9)(10)(11)(12). For example, autoantibodies against TAAs could be readily detected at the time of initial diagnosis of LC, and appear up to several months or even years prior to clinical symptoms (13,14). Indeed, the IgG autoantibodies that target autoantigens, such as p53, CTAG1A, cyclin Y, ubiquilin 1, livin, and survivin, are readily detectable in serum samples collected from LC patients (15)(16)(17)(18). Therefore, the discovery of noninvasive serological biomarkers for early stage LC diagnosis that yield high sensitivity and specificity holds great promise in intervention and prevention of LC.
Interestingly, in addition to IgG autoantibodies, IgA autoantibodies against TAAs are also abundantly found in sera of patients diagnosed with several types of cancers. For example, patients diagnosed with breast carcinoma have been reported to have an elevated level of IgA autoantibodies targeting calreticulin, a multicompartmental protein involved in the regulation of many important cellular responses, as compared to that in the matched controls. This phenomenon was found to be closely associated with lymph node metastasis (19). Dihydrolipoamide dehydrogenase (DLD), a mitochondrial protein, has been identified as an autoantigen specific to endometrial cancer using two-dimensional immunoblotting, and IgA, rather than IgG, autoantibodies against DLD were shown to be a potential serological biomarker as evidenced by markedly elevated levels of IgA autoantibodies in patient sera (20). As compared with other organs, lung has the largest mucosal surface where IgA plays an important role in mucosal immunity and protects the lung against invading pathogens (21,22). It is conceivable that IgA antibodies could recognize a different repertoire of TAAs from IgG antibodies due to cross-reactivity and/or differences in the immune responses in the lung. Therefore, we expected the integration of IgA and IgG autoimmune profiles would yield combined marker panels with improved performance.
In a recent study, our team employed a human proteome array (i.e., HuProt)-based approach and identified a panel of biomarkers, comprised of three IgG autoantigens, namely p53, HRas, and ETHE1, with 50% sensitivity at 90% specificity for early stage LC diagnosis (10). Here, we employed the same HuProt array platform to survey for IgA-bound autoantigens, followed by use of standard ELISA tests for biomarker validation (23). The validated IgA autoantigens were then combined with the previously validated IgG autoantigens for the identification and validation of integrated IgA/IgG biomarker panels. We discovered and validated an integrated biomarker panel, comprised of three IgA autoantigens (i.e., BCL7A, TRIM33, and METRF4) and three IgG autoantigens (i.e., CTAG1A, DDX4, and MAGEC2), with the best performance of 73.5% sensitivity at 85% specificity for LC diagnosis at early stages.

Cohort description
All serum samples involved in this study were collected at Fujian Provincial Hospital, in Fujian Province, China, between 2015 and 2018. This cohort was comprised of 667 serum samples collected from 171 healthy persons, 400 resident patients diagnosed with early stage LC, and 96 resident patients diagnosed with lung benign lesions (LBL). The 124 healthy persons were recruited during annual physical examinations, including chest X-ray, abdominal ultrasonography, routine urinalysis, stool occult blood test, complete blood count, blood chemistries, and tumor antigen tests, such as carcinoembryonic antigen (CEA), CA199, and alphafetoprotein (AFP), to name a few. None of them showed any evidence of malignancy in all of the tests. The 293 early stage LC patients were recruited after histopathological confirmation of LC tumors. The TNM classification was used for evaluation of NSCLC staging and the VA scheme was used to classify SCLC into limited-and extensive-stages. The 72 LBL patients, including 31 pneumonia, 16 chronic obstructive pulmonary disease (COPD), and 25 pulmonary tuberculosis (TB), were recruited after accurate clinical assessment. Detailed information of each subject of this cohort is listed in Supplementary Table 1. This study was approved by the Ethics Committee (i.e., IRB) of Fujian Provincial Hospital.

HuProt arrays and serum profiling assays
HuProt arrays were manufactured by CDI Laboratories, Inc. Each HuProt array v3.0 is comprised of 20,240 unique human full-length proteins, covering ~75% of the human proteome. Each serum sample was diluted 1000-fold in PBS and profiled on HuProt arrays using a standard protocol as described previously (24-27). Briefly speaking, the 150 μl diluted serum sample was added in a blocking buffer (3% BSA in PBS buffer with 0.1% Tween 20)-incubated HuProt array, and then incubated at 37 °C for 1h. After 3 × 10 min washes with PBST, the microarray was incubated with 150 μl of 1:1000 diluted Alexa Fluor 532conjugated goat anti-human IgA (the Jackson Laboratory, Bar Harbor, ME) at 37 °C for 1h in dark. Finally, after 3 × 10 min PBST washes, the microarray was rinsed with double-distilled H2O and dried. The microarray was scanned with the GenePix 4000B Microarray Scanner (Molecular Devices, Sunnyvale, CA) and analyzed using GenePix Pro 6.0 software (Molecular Devices).

Data analysis for assays performed on HuProt
First, the median values of the foreground (Fij) and background (Bij) intensity at a given protein spot (i,j) on the HuProt arrays were extracted. The signal intensity (Rij) of each protein spot was defined as Fij/Bij. Since each protein is printed in duplicate on an array, Rij was averaged for each protein as Rp.
Z-score of each protein on HuProt arrays was calculated using a method similar to the one described in our previous studies (24). A cutoff value of Z ≥ 3 was used to determine the positives in this study. The sensitivity and specificity were calculated for each protein. For each comparison (LC vs negative controls), the biomarker candidates were selected with the highest discriminant ability (28), which is defined as: . P values obtained from the T test were calculated and adjusted as false discovery rates (29). The optimal cutoff value for each candidate was evaluated with two criteria: 1) at least 90% specificity and 2) the highest discriminant ability.

ELISA Assay
To develop ELISA-format tests, candidate proteins were purified from yeast as described previously (30). After 50 ng of each purified protein was coated onto individual wells of an ELISA plate, each serum sample, diluted either 100-fold for IgA biomarkers or 500-fold for IgG biomarkers, was added to the wells to carry out the standard ELISA tests as described previously (30). The immunoreactivity signals were measured by reading the A450.

Discovery and validation of biomarker panels
After signals of the ELISA assays were obtained and normalized using serum samples of the discovery cohort, areas under the receiver operating characteristic (ROC) curves (AUCs) were calculated to assess the performance of each candidate biomarker. The optimal cut-off values were obtained to determine the sensitivity and specificity for the six proteins in the validation cohort (10). The performances for all possible combinations between two and six proteins were evaluated to identify combinatorial biomarker panels with better performance as following. First, the actual signal intensity of each protein was converted to either 1 or 0 such that 1 represented the signal intensity greater than the optimal cutoff value, and 0 otherwise. Next, for a given combination of n proteins, the sum of the binary scores of the n proteins was assigned to each serum sample as a summary score. If the summary score of a sample was greater than k (1 ≤ k ≤ n), the sample was called positive. The sensitivity and specificity at the best discriminant ability values were recorded for each combination. Finally, the combination and its k value were identified with the best discriminant ability at a minimum specificity of 85%. The proteins of the best combination will be further validated on an independent cohort and evaluated using the similar approach.

Immunohistochemistry (IHC)
Formalin-fixed, paraffin-embedded 3-mm thick sections were deparaffinized and rehydrated. These preparations were stained at room temperature. Staining was performed using an SP immunohistochemistry kit (Fuzhou Maixin Biotech Co., Ltd, China), following the manufacturer's recommendations. The sections were, respectively, incubated with primary antibodies against BCL7A (HPA019762, SIGMA-ALDRICH, USA) , TRIM33 (CA9501,TAKARA, JAPAN), MTERF4 (Ab121910, ABCAM, UK), CTAG1A (Sc53869, Santa Cruz Biotechnology, USA), DDX4 (mAbcam27591, ABCAM, UK), MAGEC2 (EPR19064, ABCAM, UK) for 18 h at 4 ℃, followed by incubation with the biotinylated secondary antibody for 10 min at room temperature, and horseradish peroxidase-conjugated streptavidin for 10 min at room temperature. The immnunoreactivities were visualized brown with diaminobenzidine (DAB kit; Lab Vision) and counterstained with Mayer's hematoxylin. The primary antibody was replaced with non-immune sera for negative controls, keeping all other steps in the process the same. Specimens were conducted under identical conditions. The outcome of IHC staining for 60 pairs of LC tissues and matched paracancerous tissues, randomly selected from 32 cases of adenocarcinoma, 20 cases of squamous cell carcinomas and 8 cases of small cell lung cancer, was manually evaluated and scored by two independent certified pathologists, and any disagreements were settled by discussion. The intensity of staining was graded as: 0=undetectable, 1=weak staining, 2=moderate straining, and 3=strong staining, while the proportion of positive cells within a tissue was scored as: 0: 0-1% of cells stained, 1: 2-25% of cells stained, 2: 26-75% of cells stained, 3: >75% of cells stained (31). The total IHC score was the sum of the intensity of staining and the proportion of positive cells, with 0-3 as negative IHC and 4-6 as positive IHC. All of the tissues used in IHC staining were acquired from the archives at the Department of Pathology of Fujian Provincial Hospital in agreement with the ethics committee of Fujian provincial hospital.

Overall study design
To identify IgA-autoantigens as diagnostic biomarkers for early stage LC, we employed the two-phase strategy reported previously (10) to identify novel biomarkers for early LC (e.g., stages 1 and 2) diagnosis ( Fig. 1). Briefly, in Phase I, 124 serum samples collected from 69 early LC patients, 30 healthy individuals and 25 LBL patients, were individually profiled on HuProt arrays for the presence of IgAbound autoantigens ( Table 1). A total of 28 human proteins were identified as candidate autoantigens ( Table 2). In Phase II, a much larger cohort, comprised of serum samples collected from 136 early stage LC, 58 healthy subjects and 29 LBL ( Table 3), was assembled and tested against the candidate IgA autoantigens and some previously identified IgG autoantigens using ELISA. A combinatorial IgA/IgG autoantigen panel was identified with much improved performance and was further validated using an independent cohort ( Table 4).

Discovery of candidate IgA autoantigens as serological biomarkers in LC
In Phase I, we assembled a cohort of 124 serum samples collected from 69 early stage LC patients, 30 healthy subjects and 25 LBL patients, for candidate biomarker identification ( Table 1; Supplementary Table 1). Statistical analyses did not show any significant differences between the LC, LBL, and healthy groups in terms of age, gender or smoking history composition ( Table 1).
To obtain anti-human IgA autoimmune profiles, each serum sample was diluted 1000fold and individually incubated on a HuProt array, followed by the detection of autoantigens using Cy5-labeled anti-human IgA secondary antibodies. Anti-IgA signals were acquired, normalized, and quantified for each assay, based on which standard deviation (S.D.) was calculated (24). Using a cutoff value of Z score ≥ 3, IgA-bound target proteins were determined for each serum sample. For example, BCL7A and MTERF4 were preferentially recognized by human IgAs in the LC patients, but much less so in healthy subjects and LBLs ( Fig. 2A). Sensitivity and specificity values were calculated for each serum-positive protein, based on which the discriminative ability values were calculated as described previously (see Methods; (10, 28)).
To determine which of the IgA-bound autoantigens would be subjected to Phase II validation using standard ELISA-format testing, we first selected those with specificity values >85% and then ranked them according to their discriminative ability values (see Methods). Of the 72 candidate IgA autoantigens, we selected the top 20 candidates, and eight additional ones that were either highly expressed in lung cancer on the basis of tissue pathology (e.g., TPM3 and TTC1) or functionally relevant in tumorigenesis (e.g., MAGEC2 and BCL7A) ( Table 2) (32).

Validation of IgA autoantigens as biomarkers for early stage LC diagnosis with ELISA
We collected serum samples from 136 patients diagnosed with early stage LC, including 24 limited stage SCLC, 64 stage I/II adenocarcinoma, and 48 stage I/II squamous-cell carcinoma. Negative controls included 58 healthy subjects and 29 LBL patients. Statistical analysis did not find any significant differences in age, gender or smoking history between the LC group and the control groups ( Table 3;  Supplementary Table 1). All of the 28 selected candidate IgA autoantigens were successfully purified as recombinant proteins from yeast and the quantity and quality of the purified proteins were examined with Coomassie stain as previously described (33).
To carry out ELISA-format testing, each candidate autoantigen was coated onto individual wells of an ELISA plate, and incubated with serum samples diluted either 100-fold for IgA biomarkers or 500-fold for IgG biomarkers (30). To investigate the reproducibility, two proteins, IgA-based TRIM33 and IgG-based CTAG1A, were chosen to be repeatedly measured in two samples, respectively. For each protein in each sample, ten repeats were done in one batch, and the other ten repeats were done for 10 consecutive days. Repeatability in the same batch and batch-to-batch reproducibility were investigated by calculating the standard deviation and correlations of the repeat signals across batches (Supplementary Fig. 1). The CVs in the same batch range from 4.7% to 10.0%, while CVs across batches range from 13.4% to 21.4%. The intensity between two samples across batches are correlated.
The results indicate there is some batch effect, although the repeatability in the same batch is good. Therefore, the positive hits were firstly identified batch by batch, and then combined for further analysis. Then ELISA testing on the new cohort was performed. After the ELISA signals were acquired and normalized to the negative controls, we performed box plot analysis, obtained the receiver operating characteristic (ROC) curves, and calculated area under the curve (AUC) values to access the performance of each candidate (Fig. 3). The AUC values ranged from 0.503 to 0.673 for the 28 candidates, and their sensitivity values were found between 6.6% and 32.4% at a specificity value > 90% ( Table 5). The analysis of the ELISA data with this new cohort validated the majority of the candidate biomarker proteins identified with the HuProt array approach.

Discovery and validation of an integrated IgA/IgG biomarker panel
We noticed that the sensitivity value of each newly validated IgA biomarker was not very high, similar to what we observed with individual IgG biomarkers in our previous studies. Therefore, we hypothesized that integration of IgA and IgG biomarkers would yield combined marker panels with improved performance.
To test this hypothesis, we tested the IgG seroreactivity of the same validation cohort used above against the eight top IgG biomarkers described in our previous studies (10), as well as TRIM33, MAGEC2, DDX4, RALGDS, BCL7A, SULT2B1, DCLK1 and NOL3, which are validated IgA autoantigens known to be involved in cell proliferation (34-41). As expected, the performance of the previously identified eight IgG biomarkers recapitulated our previous studies, and the additional eight candidates also showed comparable performance at specificity >90% (Table 6).
To identify the optimal combination of IgA/IgG biomarker panels, we exhaustively evaluated the performance for all possible combinations between two and six proteins (= 8,295,001 combinations). Using the same computational approach described previously, we identified the best combination, which was comprised of three IgA autoantigens (i.e., BCL7A, and TRIM33 and MTERF4) and three IgG autoantigens (i.e., CTAG1A, DDX4 and MAGEC2) (10). This panel achieved 73.5% sensitivity at 85.1% specificity with a k value of 1. In other words, a serum sample would be scored positive when at least one (i.e., k = 1) of the six proteins showed signal intensity greater than the corresponding optimal cutoff value (Fig. 4).
To validate this integrated biomarker panel, we tested these six antoantigens on 142 serum samples collected from 88 early stage LC, 36 healthy, and 18 LBLs. Using the same method as described above, 68.2% of samples in the early stages of LC were scored as positives, and only 8.3% and 22.2% of healthy and LBL samples were respectively scored as false positives. Therefore, this biomarker panel showed 68.2% sensitivity at 87.0% specificity for early LC diagnosis in the validation (Fig. 4).
The remaining 178 serum samples, as a blind test set, were analyzed to test the biomarker panel. Since the sample information was unknown before data analysis, the positive hits for each autoantigen were identified using the approach described in the method part of the HuProt array analysis. After combination of six autoantigens, 79 samples were identified as positive samples. The sample annotation were informed at this stage to evaluate the performance. The biomarker panel result in blind test were shown with the performance of 62.6% sensitivity at 83.1% specificity (Fig. 4).
For all those three measurements, the biomarker panel shows no significant difference between smokers and non-smokers (p value>0.05, chi-square test). The results indicate the diagnostic of the panel have no preference with smokers and nonsmokers.

Discussion
In early studies, humoral IgA antibodies targeting Epstein-Barr virus (EBV)-encoded viral antigens, such as VCA, EA and EBNA, were found to serve as surrogate biomarkers for prognosis of nasopharyngeal carcinoma (42-45). In recent studies, IgA autoantibodies against calreticulin and IgA autoantibodies against a mitochondrial protein DLD, were reported as biomarkers for endometrial cancer (20, 46). Interestingly, no significant signals of the IgG autoantibodies targeting the same autoantigens were found in either cancer. These results implied that IgA and IgG autoimmune responses can be quite different from each other in cancer patients and, therefore, integrated detection of IgA and IgG autoantibodies might further improve the power of a biomarker panel in early diagnosis of cancer.
To test this hypothesis, we decided to employ HuProt arrays to carry out an unbiased, comprehensive survey for IgA-bound autoantigens for early diagnosis of LC. In Phase I, 72 candidate IgA autoantigens were identified, 28 of which were selected for validation using ELISA in Phase II. The AUC values ranged from 0.503 to 0.673 for the 28 candidates and their sensitivity values were found between 6.6% and 32.4% at >90% specificity. To discover integrated biomarker panels comprised of IgA and IgG autoantigens, we re-tested the IgG autoimmune profiles and identified and validated a biomarker panel of three IgA autoantigens (i.e., BCL7A, TRIM33 and MTERF4) and three IgG autoantigens (i.e., CTAG1A, DDX4 and MAGEC2) for diagnosis of early stage LC with 73.5% sensitivity at >85% specificity.
Our study design possessed and displayed several strengths (47). First, we employed the most comprehensive human proteome (HuProt) arrays, with >75% coverage of the human proteome, to improve the likelihood of finding potential biomarkers. Second, we recruited 293 LC patients who were diagnosed with three LC subtypes at early stages, with the aim of finding robust LC biomarkers. Third, we combined the LBL samples with healthy subjects as negative control groups to enable better discrimination of malignant from benign lesions. Finally, ELISA was used as an independent platform to validate the newly discovered biomarkers and to identify integrated IgA/IgG biomarker panels.
Among the six autoantigens in the biomarker panel, antigenicity of TRIM33, CTAG1A, DDX4, MAGEC2 were reported in diseases and cancers (34, 48-51). We further investigated the protein expression in lung cancer and corresponding paracancerous tissues collected from 60 patients by immunohistochemical (IHC) analysis. The total IHC score was the sum of the intensity of staining and the proportion of positive cells, with 0-3 as negative IHC and 4-6 as positive IHC. For all those six proteins, tumors show significantly higher IHC positive rates than that in normal adjacent tissues.
A limitation of this study is that only serum samples collected in China were employed, raising a possibility, though remote, that there could exist some ethnicity bias. Therefore, further validation studies with serum samples collected from other ethnic groups are necessary to confirm the performance of this biomarker panel.
In summary, we performed a comprehensive autoantibody-based survey for the discovery and validation of serum biomarkers for early LC diagnosis. It is important to note that since the serum samples were collected from patients at diagnosis, the biomarkers identified in this study were not identified in a LC screening cohort. Therefore, it would be important in the future to examine the performance of the biomarker panel with serum samples collected before a person shows any LC-relevant pulmonary symptoms. Furthermore, since some genes are known to be mutated in LC cancer, we believe that inclusion of mutated proteins on the protein arrays may further improve accuracy of LC diagnosis and reduce false positive rates.

Fig. 2. Examples of IgA-bound autoantigens identified on HuProt arrays in Phase I.
A. Anti-human IgA images of BCL7A and MTERF4 obtained with serum samples collected from a LC patient, healthy subject, and LBL patient. IgA-bound autoantibodies were visualized with a Cy3-labeled anti-human IgA secondary antibody on HuProt arrays. In both cases, BCL7A and MTERF4 were specifically recognized by IgA antibodies of a LC patient; no detectable signals were observed with a healthy or LBL serum. B. Box plot analysis of HuProt array profiling of BCL7A (upper panel) and MTERF4 (lower panel) in LC, healthy and LBL.

Fig. 3. Examples of validated IgA autoantigens using ELISA tests in Phase II.
Left: Box plot analysis of ELISA results obtained with BCL7A and MTERF4 in Phase II validation. The results clear showed that the signal intensities of the two proteins are significantly higher in the early LC group than those obtained in the control groups. Right: ROC analysis of BCL7A (upper panel) and MTERF4 (lower panel). Values of AUC and sensitivity and specificity obtained at the optimal cut off value for each protein are also shown.

Fig 4. Performance of an integrated IgA/IgG biomarker panel in the discovery and validation stages in Phase II
A. Performance of the identified biomarker panel in early LC, healthy, and LBL groups in discover and validation stages. The orange and light blue bars represent the positive and negative signals of each individual biomarker (in rows) scored in each serum sample (in columns), respectively. IgA and IgG autoantigens are indicated with "-A" and "-G", respectively. B. Tabulation of the performance and positive rate of each category of the biomarker panel are shown at the bottom.