A Robust Approach for Identification of Cancer Biomarkers and Candidate Drugs

Background and objectives: Identification of cancer biomarkers that are differentially expressed (DE) between two biological conditions is an important task in many microarray studies. There exist several methods in the literature in this regards and most of these methods designed especially for unpaired samples, those are not suitable for paired samples. Furthermore, the traditional methods use p-values or fold change (FC) values to detect the DE genes. However, sometimes, p-value based results do not comply with FC based results due to the smaller pooled variance of gene expressions, which occurs when variance of each individual condition becomes smaller. There are some methods that combine both p-values and FC values to solve this problem. But, those methods also show weak performance for small sample cases in the presence of outlying expressions. To overcome this problem, in this paper, an attempt is made to propose a hybrid robust SAM-FC approach by combining rank of FC values and rank of p-values computed by SAM statistic using minimum β-divergence method, which is designed for paired samples. Materials and Methods: The proposed method introduces a weight function known as β-weight function. This weight function produces larger weights corresponding to usual and smaller weights for unusual expressions. The β-weight function plays the significant role on the performance of the proposed method. The proposed method uses β-weight function as a measure of outlier detection by setting β = 0.2. We unify both classical and robust estimates using β-weight function, such that maximum likelihood estimators (MLEs) are used in absence of outliers and minimum β-divergence estimators are used in presence of outliers to obtain reasonable p-values and FC values in the proposed method. Results: We examined the performance of proposed method in a comparison of some popular methods (t-test, SAM, LIMMA, Wilcoxon, WAD, RP, and FCROS) using both simulated and real gene expression profiles for both small and large sample cases. From the simulation and a real spike in data analysis results, we observed that the proposed method outperforms other methods for small sample cases in the presence of outliers and it keeps almost equal performance with other robust methods (Wilcoxon, RP, and FCROS) otherwise. From the head and neck cancer (HNC) gene expression dataset, the proposed method identified two additional genes (CYP3A4 and NOVA1) that are significantly enriched in linoleic acid metabolism, drug metabolism, steroid hormone biosynthesis and metabolic pathways. The survival analysis through Kaplan–Meier curve revealed that combined effect of these two genes has prognostic capability and they might be promising biomarker of HNC. Moreover, we retrieved the 12 candidate drugs based on gene interaction from glad4u and drug bank literature based gene associations. Conclusions: Using pathway analysis, disease association study, protein–protein interactions and survival analysis we found that our proposed two additional genes might be involved in the critical pathways of cancer. Furthermore, the identified drugs showed statistical significance which indicates that proteins associated with these genes might be therapeutic target in cancer.


Introduction
One of the important and common objectives of microarray experiments is to detect the genes that can differentiate biological samples into two biological conditions. These genes are called differentially expressed genes (DEGs). Biomarkers are the few DEGs that have prognostic capability of a specific disease. In the earlier days of microarray experiments, the simplest method fold change (FC) was used to identify the DEGs or EEGs (equally expressed genes) [1,2]. In this method, log-ratio of gene expressions between two conditions is calculated and a particular gene is said to be DEG, for which the absolute value of FC exceeds an arbitrary cut-off value that depends on user interest. There are several FC based methods for DEGs selection such as average difference (AD), weighted average difference (WAD), FC, and distributional fold change (DFC) [3,4].
On the other hand, there are two types of statistical approaches based on p-values to select the DEGs. One is parametric approach and the other is nonparametric approach. In parametric approach, the t-test and t-test-like approaches are typically used to identify the DEGs. However, the conventional t-test suffers from two types of drawbacks: One is multiple testing problems, for example, in microarray data having thousands of genes the t-test needs to be performed thousands of times, separately. Therefore, some genes with large t-statistic are false selected as DEGs because of the low variance of gene expressions. Another drawback of the t-test occurs with small sample sizes, which implies low statistical power. The extended version of t-test-like significance analysis of microarrays (SAM) and linear models for microarray (LIMMA) overcomes the shortcomings of t-test [5][6][7][8]. The analysis of variance (ANOVA) is widely used to microarray data to identify genes with more than two conditions [9,10]. The parametric approaches often rely on the assumption of normality of the data. While nonparametric approaches like Kruskal-Wallis (KW), Wilcoxon test [11,12], and SAM do not rely on such assumptions. Nowadays, researchers are also interested to finding cluster-based DEGs [13].
However, most of the methods discussed earlier are suitable only for the unpaired independent samples to detect DEGs. They are not suitable for the datasets generated from the paired correlated samples to detect DEGs [14,15]. Due to the heterogeneity of cancer genetics, paired samples measurements were used to detect DEGs by few researchers [16,17]. In paired data analysis, the repeated measurements within the subjects are considered rather than the across subjects.
Moreover, most of the statistical methods discussed earlier control type I (false-positive) error rate such as family-wise error rate (FWER) or false discovery rate (FDR) to avoid multiple testing problem [18]. However, they do not give much attention on controlling type II (false-negative) error rate which usually occur in presence of outliers with low expressions of DEGs. Thus the researchers may lose some important DEGs by the existing methods since they are not robust against outliers. Parametric approaches, like hybrid ANOVA [19], and nonparametric approaches, like KW [11], are robust against outliers for large sample cases, but they show weak performance in small sample cases.
Furthermore, all the methods discussed earlier make the decision of DEGs independently by their own criteria. The p-value-based statistical tests sometimes detect significant DEGs with small fold changes due to smaller pooled variance of the gene expressions, which occurs when variance for each individual condition becomes smaller. However these small fold change values may not be biologically meaningful [20]. So a contradiction between statistical and biological point of view is created to select the DEGs. To overcome this problem, some methods were proposed that take variability of the gene expressions level into account with FC information such as rank products (RP), t-tests relative to a threshold (TREAT), π-value, and fold change rank ordering statistics (FCROS) [20][21][22][23]. However these methods show weak performance in small samples in the presence of outliers. Therefore, in this paper an attempt is made to develop a hybrid robust SAM-FC approach by combining rank of FC values and rank of p-values based on SAM statistic using minimum β-divergence method to overcome all the aforementioned problems. The proposed method is designed for paired samples and can also be applied for equal sample size of two conditions. The formulation of the proposed algorithm will be discussed in the next section and the performance of the proposed algorithm in a comparison of the other methods as early mentioned using both simulated and real gene expression profiles of cancer diseases will be discussed in the results and discussion section.

Materials and Methods
Let x gik be the gth gene expression values of kth replicates in the ith condition (g = 1, 2, ..., G; i = 1,2; k = 1, 2, ..., n; n = n 1 = n 2 ), which follows normal distribution with mean µ gi and variance σ 2 . That is Then the FC values of the gth gene is denoted as d gk = x g1k − x g2k (k = 1, 2, ..., n) and also follow N (µ g , σ g 2 ); where, µ g = µ g1 − µ g2 and σ g 2 > 0, respectively. Now we want to test Which implies that H 0 : µ g1 = µ g2 vs H 1 : µ g1 µ g2 assuming that σ g 2 > 0 is unknown. If H 0 is accepted, the gth gene is said to be equally expressed (EE) gene; otherwise it is DE gene. The simple statistical method for detecting DE gene from the paired samples is the t-test. The formula for the one sample t-test is as follows Which follows the Student's t distribution with (n − 1) degrees of freedom. Where, r g and s g are the estimates of µ g and σ g , respectively. However the identification of DEGs using a t-test is highly sensitive to outliers, and it also shows poor performance when the sample size is small. The SAM based on modified t-statistic overcomes the shortcomings of the t-test in microarray experiments with small sample sizes [7]. The SAM algorithm for paired samples was developed with the following modifications of the t-statistic.
where, s * g = s g √ n and s 0 represent the percentile of the distribution of sample standard deviations, which is chosen in a way so that the coefficient of variation of u g is minimized. However, this modified statistic (6) for SAM is also sensitive to outliers. Therefore, in this paper, we update Equation (6) using minimum β-divergence estimators instead of maximum likelihood estimators (MLEs) in the presence of outliers. The minimum β-divergence estimatorsθ g,β = μ g,β ,σ 2 g,β of the parameter θ g = µ g , σ 2 g are obtained iteratively as follows where, which is known as β-weight function [24,25]. This β-weight function (9) produces larger weights (≤1) corresponding to usual/normal expressions and smaller weights (≥0) corresponding to unusual/outlying expressions of d gk . Note that the minimum β-divergence estimatorsθ g,β = μ g,β ,σ 2 g,β reduce to MLEŝ θ g = μ g ,σ 2 g for β = 0. Since the MLEs of Gaussian distribution are consistent and asymptotically efficient, we used MLEs in Equation (6) in the absence of outliers. We employ the β-weight function (9) as a measure of outlier detection by setting β = 0.2. The robustness and efficiency of the estimators depends on the tuning parameter β. Therefore, it is very important to select β properly. There are several existing approaches for selection of tuning parameter β including leave-one-out (LOOCV) and K-fold cross-validation (CV) [24,25]. In the absence and presence of outliers, the CV approach produces β = 0 and β > 0, respectively. The minimum β-divergence estimators in Equations (7) and (8) reduce to MLEs in Equations (4) and (5) with β = 0, as earlier mentioned. So we can apply the CV approach. However, this approach needs to execute thousands of time since microarray data usually contains thousands of genes. Therefore, CV approach may prove conservative in this case. For this reason, in this paper we consider outlier detection approach based on β-weight function with β = 0.2. This kind of β selection was also used in Shahjaman et al. (2017) [26].
Fold change gene expression, d gk , is categorized by the β-weight function with β = 0.2 as follows where, the δ 0 is determined by the following equation.
Now we calculate the β-FC values for the gth gene as follows FC g,β = μ g,β , in presence of outlierŝ µ g , in absence of outliers (15) Which reduces to the classical FC as FC g =μ g for β = 0.

Selection of Top Differential Gene Expressions
Our proposal is to select the top DEGs by combining both the rank of β-FC and rank of β-SAM values using the following steps.

3.
Calculate the average of both ranks for each gene.

4.
Arrange the average rank values in ascending order.

5.
Select first N ordered genes as top DE genes such that β-SAM of Nth ordered gene produces p-value < 0.1.

Performance Assessment
Let us consider a two-class prediction problem such as DEG or EEG. The outcomes are then divided into four categories: (i) DEGs are correctly predicted as DEGs (true positives: TP), (ii) DEGs are incorrectly predicted as EEGs (false-negatives: FN), (iii) EEGs are correctly predicted as EEGs (true negatives: TN), and (iv) EEGs are incorrectly predicted as DEGs (false-positives: FP). To investigate the performance of algorithms for detection of differential gene expression, we employed the following measures using the above categories.

Results
We demonstrated the performance of the proposed method in a comparison with other popular existing methods, such as t-test [9], SAM [7], Wilcoxon [12], and LIMMA [5], and fold change-based methods WAD [4], RP [21], and FCROS [22], using both simulated and real gene expression data. We used six R packages of others methods such as stats, samr [8], TCC [27], fcros [22], limma [6], and RankProd [21]. We employed all the seven methods including our proposed method in the one simulated and two microarray gene expression datasets to evaluate the performance of these methods. The description of these datasets is as follows.

Data Type 1: Simulated Gene Expression Profiles
We generated 100 datasets from the model described in equation (1) for both small (n 1 = n 2 = 3) and large sample (n 1 = n 2 = 15) cases, respectively, for the arbitrary values of (µ 1 , µ 2 ) ∈ (3, 5) and σ 2 = 0.05. Each dataset represented the gene expression profiles for 10,000 genes with (n = n 1 = n 2 ) samples. Among the 10,000 genes represented in each dataset, the numbers of DEGs were set to 300 (proportion of DEGs = 0.03) and the other 9700 genes were considered EEGs. To investigate the robustness performance of all the methods, we contaminated each of 100 dataset by outliers. In the contaminated dataset, we contaminated one or two expressions for each gene across the genome out of (n 1 + n 2 ) expressions by outliers. We generated an outlying expression in the ith condition for gth gene using x * gik = d + 2 × max(x gik ; k = 1, 2, . . . , n i ; i = 1, 2; where, d ∈ (5, 10) is an arbitrary value. Then we replace x gik by x * gik in the data matrix to get the contaminated dataset. Data Type 2: Platinum Spike Gene Expression Profiles This dataset consists of 18 spike-in samples with 9 controls versus 9 tests [28]. This dataset can be downloaded from the Gene Expression Omnibus website under the accession number GSE21344. Using RMA preprocessing and filtering of this dataset, we retained 18,707 probes, among which 1944 probes are known as the DEGs under spiked-in fold changes of 0.25, 0.28, 0.40, 0.66, 0.83, 1.5, 1.7, 2, 3, and 3.5. To investigate the performance of the proposed method in a comparison of the other methods in presence of outliers with the real dataset, we contaminated this dataset by outliers, where outliers are generated as data type 1.

Data Type 3: Head and Neck Cancer Gene Expression Profiles
Gene expression profiles were collected from 42 paired (from the same patient) samples of head and neck squamous cell carcinoma (HNSCC) and normal tissue. We used the normalized head and neck cancer (HNC) dataset [29]. We downloaded this dataset from GEO website with accession number GSE663. After preprocessing of RNA samples using Affymetrix GeneChip CELL files were obtained. Further, Robust Multichip Analysis (RMA) was used to obtain signal to probes from the CELL file. This dataset contains 12,642 probe sets.

Performance Evaluation Based on Simulated Gene Expression Profiles
Sometimes, statistical methods select DEGs who have no biological significance (genes with small fold changes). Hence, researchers now require genes who satisfy both statistical significance (p-value < 0.05) and biological significance (absolute FC > 1.5 or 2). For example Mollah et al. (2015) declared genes to be DEGs if they show absolute FC > 2 and also satisfy adjusted p-value < 0.05 [19]. Xiao et al. (2014) introduced a score that combine both the FC and p-value together [23]. They found that this combination of gene ranking provided much better results than only p-value-based ranking. However, sometimes, existing methods fail to meet their interests due to smaller variance of gene expressions or in presence of outliers. For example, a gene satisfying 0 < absolute FC < 1.5 may declared as DEG with adjusted p-value < 0.05 obtained by the existing popular statistical methods (t-test, SAM, LIMMA, and Wilcoxon) in absence of outliers for both small-and large sample cases (see Figure 1a,b), which leads to increase the FDR. On the other hand, a gene with absolute FC > 1.5 may be declared as EEG with adjusted p-value > 0.2 in presence of outliers for both small-and large sample cases (see Figure 1c,d), which leads to increase the false omission rate (FOR) by each of the four methods, except KW for large sample cases. The other methods that combine p-value and FC value are [20][21][22][23], but they show weak performance when the sample size is low in the presence of outliers. We can also visualize the DEGs by considering both p-value and FC value using Volcano plot [30]. In this plot minus log10(p-value) is plotted against log2FC and usually the genes showing values above certain thresholds of the two quantities are called DEGs. To demonstrate the performance of traditional SAM and the proposed robust SAM using the Volcano plot we generated a dataset using (1). This dataset consists of 1000 genes including 50 upregulated DEGs and 50 downregulated DEGs. Figure S1a in supplenetary file 1 shows the Volcano plot in presence of 10% outliers for small sample cases. In this plot the blue, green and black symbols represent the upregulated DEGs, downregulated DEGs, and EEGs, respectively. The p-values were calculated by the traditional SAM method. We clearly observe that there are many outlying DEGs those are not satisfy the criteria of p-value < 0.05 (horizontal gray dot line) and absolute FC > 1.5 (magenta vertical line). We also observe that there are few EEGs that are declared as DEGs by this criteria. Thus using the traditional SAM method we fail to detect the true DEGs in presence of outliers. Figure S1b represents the Volcano plot by our proposed method. From this plot it is evident that our proposed method can detect the true DEGs in presence of outliers by combining FC values and p-values. Therefore, in this paper, we developed a hybrid robust SAM-FC approach by combining rank of FC values and rank of p-values based on SAM statistic using the minimum β-divergence estimation to overcome the problems as mentioned above. Figure 2a,b represents the M-A plot of a dataset, which is one of the 100 simulated datasets of data type 1 for the small sample case (n 1 = n 2 = 3) in absence and presence of one outlying sample with each of 5% genes, respectively. The circle and asterisk symbols in the Figure 2a indicates that DEGs and EEGs, respectively. In Figure 2b, the red circle and red asterisk symbols indicate the outliers with DEGs and EEGs, respectively. To investigate the outlier detection performance of the proposed method, we present Figure 2c,d, corresponding to absence and presence of the outlying dataset (see Figure 2a,b), respectively. In these figures, we calculated the β-weights for each gene based on the n = 3 FC observations corresponding to n 1 = 3 samples of first condition and n 2 = 3 samples of second condition using the Equations (7) and (8) in Equation (9). Then we calculate the smallest β-weight for each gene. The predicted and observed distributions of the β-weights are shown in Figure S2 in the supplementary file 1. The Figure 2c,d represent the scatter plot of the smallest β-weight for each of 10,000 genes against their ordered gene index. The gray line (see Figure 2c,d) indicates the threshold line of the cut-off value δ, which is computed using (11). From Figure 2c we observed that all the smallest β-weights for each gene are above the threshold line with δ = 0.2, which indicates that there is no outlying genes in the dataset. In the Figure 2d, we see that some smallest β-weights (red colour) are below the threshold line with δ = 0.2, which indicates that the genes associated with these smallest β-weights are outlying genes. Thus our proposed method can detect the outliers and unify both classical and robust estimates in Equation (13).     Performance Evaluation Based on Simulated Gene Expression Profiles Using Data Type 1 To investigate the performance of the proposed method for the detection of DEGs in a comparison with seven popular methods (t-test, SAM, LIMMA, Wilcoxon, WAD, RP, and FCROS), we computed average FDR corresponding to the top 300 estimated DEGs based on 100 datasets of data type 1 for each method, in absence and presence of one or two outlying expressions for each gene across the genome for both small (n 1 = n 2 = 3) and large sample (n 1 = n 2 = 15) cases, respectively (which is shown in Figure 3). The p-values were obtained from each of the methods are adjusted by the Benjamin-Hochberg (BH) method for multiple testing correction. We declare a gene as DEG if the adjusted p-value corresponding to this gene is less than 0.05. From Figure 3a,b, we observe that all the eight methods performed almost equally in the absence of outlying sample for both small and large sample cases, respectively. However, for the small sample case, our proposed method produces slightly better performance than the other methods. However in presence of outliers the proposed method produces much smaller FDR than the other methods, for small sample case (see Figure 3c). In the case of large sample (see Figure 3d) in presence of outliers, Wilcoxon, FCROS, RP, and the proposed method show better performance compare to the other methods (t-test, SAM, LIMMA, and WAD). We also computed the average values of different performance measures (TPR, FPR, TNR, FNR, FDR, MER, AUC, and pAUC) to construct the Table 1. In this table the results inside the parenthesis (.) indicate the summary statistics in presence of outliers. From this table we also observe that the proposed method produces much better results than the other methods since it produces larger values of TPR, TNR, AUC, and pAUC and smaller values of FNR, FPR, FDR, and MER for all the cases. Furthermore, we performed a proportion test to determine the statistical significance of several proportions produced by the eight methods for each of the performance measures. The p-values of Table 1 were obtained by the Pearson's chi-squared test statistic [31]. From this results we can conclude that since all the p-values corresponding to each performance measures except FPR (in presence of outliers) are less than 0.01 in small samples for both in absence and presence of outliers, therefore they are highly statistically significant. In this case we also found that the FPR (p-value < 0.05) is statistically significant at 5% level of significance in presence of outliers. On the other hand, for large sample case in absence of outliers all the performance measures show statistically insignificant, whereas, in the presence of outliers, all the performance measures are statistically significant at the 1% level of significance. Figures S3 and  S4 display the box plot of AUC and MER values based on 100 datasets of data type 1. The plot of FNR against FPR and ROC curve are also shown in Figures S5 and S6, respectively. Similar results were found from these figures like FDR plot. Thus on an average the proposed method outperforms other methods.

Performance Evaluation Based on Real Gene Expression Profiles
We used two microarray gene expression datasets to evaluate the performance of the proposed method in a comparison of the other methods as discussed earlier. The first dataset is the platinum

Performance Evaluation Based on Real Gene Expression Profiles
We used two microarray gene expression datasets to evaluate the performance of the proposed method in a comparison of the other methods as discussed earlier. The first dataset is the platinum spike gene expression profiles [28] (data type 2) and second dataset is the head and neck cancer gene expression profiles [29] (data type 3).

Performance Evaluation Based on Platinum Spike Gene Expression Profiles
This dataset of data type 2 contains 18,707 probes with 18 spike samples, nine in each condition. There were 1944 genes recommended as valid DEGs under spiked-in fold changes of 0.25, 0.28, 0.40, 0.66, 0.83, 1.5, 1.7, 2, 3, and 3.5, and the other 16,743 genes were recommended as EEGs. To investigate the performance of the proposed method in a comparison of the other seven methods (t-test, Wilcoxon, SAM, LIMMA, WAD, RP, and FCROS) with the original and contaminated datasets, we employed each of these methods in both datasets to identify the DEGs. We computed different performance indices (TPR, TNR, FPR, FNR, FDR, MER, AUC, and pAUC) based on top 1944 estimated DEGs by each method. The DEGs were selected using adjusted p-values < 0.05 criterion for each of the methods. The p-values were adjusted by the Benjamin-Hochberg (BH) method. The summary statistics of these performance indices is presented in Table 2. The results inside the parenthesis (.) indicate the summary statistics for the contaminated dataset. We observed that all the methods produce almost similar results with the original dataset. However, the proposed method produces slightly better performance than the other methods.  Table 2 were obtained from this test statistic. From this table we observed that in absence of outliers the p-values for each of the performance measures show statistically insignificance (p-value > 0.05). This indicates that the performances of all methods are similar in absence of outliers. Whereas, in presence of outliers, they show statistically significant at 1% level of significance. This indicates that the proposed method outperformed other methods in presence of outliers. The outlier detection performance using β-weight function of the proposed method is demonstrated with the contaminated dataset. These results are displayed in Figure S7 in supplementary file 1. Figure S7a shows the scatter plot of the smallest β-weight for each gene with its index and Figure S7b show the scatter plot of ordered smallest β-weights corresponding to their gene index. We considered the minimum value of β-weights of n = 9 expressions as the smallest β-weight for each gene in both figures. Outlying genes are indicated by the red colour. Thus we observed in Figure S7a that the β-weight function produces smaller weights for outlying genes, where an outlying gene is defined if its smallest β-weight is smaller than the cut-off value, δ = 0.2 (see Equation (11)).
To investigate the performance of the proposed method in a comparison of the other methods for the small sample case in both original and contaminated datasets, we generated 100 bootstrap datasets from each of the dataset as discussed earlier. In each bootstrap dataset, we selected three expression data points randomly from each condition for each gene. Then we estimated the performance indices by each of the methods as before. Note that the bootstrap samples were considered to increase the accuracy and precision of the estimates. It is noticeable from Table 3 that all the methods produce similar results with the original dataset except Wilcoxon, whereas the proposed method produces much better results with the contaminated dataset. In addition, we conducted a statistical test (as before) to determine the significance of several proportions produced by the eight methods for each of the performance indices. Since all the p-values in Table 3 are less than 0.01, so we can conclude that the performance results are highly statistically significant. This indicates that proposed method outperformed other methods. Table S1 in the supplementary file 2 represents the comparative results of the eight methods with the valid differentially expressed (DE) gene-set in both original and contaminated datasets for sample sizes n 1 = n 2 = 9 and 3, respectively. These results also supported the results of Table 2; Table 3.
Thus from this study we may conclude that the performance of the proposed method improves over the other methods.  Table 3. Performance evaluation based on spike gene expression profiles using data type 2 for small sample cases n 1 = n 2 = 3.

Performance Evaluation Based on Head and Neck Cancer Gene Expression Profiles
To investigate the performance of the proposed method in a comparison of the seven popular methods (t-test, Wilcoxon, SAM, LIMMA, WAD, RP, and FCROS), we directly applied this methods in this dataset to identify the DEGs. Figure 4a,b represents the Venn diagram of the top 50 DEGs detected by the t-test, SAM, LIMMA, and Proposed and WAD, RP, FCROS, and Proposed, respectively. We used the Benjamin-Hochberg (BH) method to adjust the p-values for all methods. From Figure 4a, we notice that there are 34 DEGs that are common to all the four methods (t-test, SAM, LIMMA, and Proposed). The proposed method identified 15% outlying genes using the β-weight function. Figure 4c shows the scatter plot of ordered smallest β-weight for each gene with its index and the Figure 4d show the histogram of β-weights. In both figures the outlier genes are indicated by red color. From Figure 4a,b we observe that the proposed method shares more genes with the other methods. The proposed method identified two (2) DE genes (CYP3A4 and NOVA1) that were not detected by the other three methods (t-test, SAM, and LIMMA). Among these two genes, CYP3A4 is outlying gene. To explore the biological functions of these genes (CYP3A4 and NOVA1), we used WebGestalt2 software package [32]. From this website we obtained GO (Gene Ontology), KEGG pathway, and disease association results. From the GO database we revealed that these genes are involved in different biological processes, such as metabolic process, biological regulation, cell communication, response to stimulus, etc., and different molecular processes such as nucleic site binding, protein binding, lipid binding, oxygen binding, ion binding, and so on (see Files S1 and S2 (.xls) in supplementary file 2). Using the KEGG database, we found that these two genes are significantly enriched in linoleic acid metabolism, drug metabolism, steroid hormone biosynthesis, retinol metabolism, bile secretion, and metabolic pathways (see Table 4). The disease association results of these two genes are summarized in Table 5 and revealed that these genes are associated with some cancer-related diseases such as prostate cancer, osteosarcoma, mammary neoplasms etc. The last column of the tables represents the adjusted p-values. The p-values were obtained from the hypergeometric test and adjusted by Benjamin-Hochberg method. Therefore, we may conclude that these two genes (CYP3A4 and NOVA1) identified by the proposed method may be cancer biomarker as they are involved in different cancer-related pathways. A protein-protein interaction (PPI) is constructed in Figure 5 to show the relationship interactions between the DEGs using BioGRID web server database [33]. In addition, we performed a prognostic power analysis through multivariate Cox regresssion as implemented in SurvExpress [34] using an independent RNA-seq dataset obtained from The Cancer Genome Atlas (TGCA). This dataset consist of 283 samples. The estimated survival probabilities are represented by Kaplan-Meier plot in Figure 6. From this analysis we demonstrate that the combined effect of these 2 genes has prognostic capability in HNC with a hazards ratio of 1.59 and log-rank p-value = 0.012. Furthermore, to propose candidate drugs and association of drugs with the identified genes, we used GLAD4U and Drug bank database. 12 statistically significant drugs were obtained interacting two genes (adjusted p-value < 0.05) ( Table 6). The identified drugs were (darunavir, Cremophor EL, nilvadipine, alprazolam, felodipine, triazolam, androgen, progestogen and estrogen in combination, quinine and derivatives, desloratadine, acetaminophen glucuronide, clarithromycin, erythromycin, and paliperidone).

Discussion
Identification of DEGs across two or more biological conditions is an important task for microarray data analysis. The simplest method for detecting DEGs with biological significance is fold change (FC). However this approach does not take variability into account. Therefore, the statistical t-test is extensively used for identification of DEGs by considering the variability in the

Discussion
Identification of DEGs across two or more biological conditions is an important task for microarray data analysis. The simplest method for detecting DEGs with biological significance is fold change (FC). However this approach does not take variability into account. Therefore, the statistical t-test is extensively used for identification of DEGs by considering the variability in the

Discussion
Identification of DEGs across two or more biological conditions is an important task for microarray data analysis. The simplest method for detecting DEGs with biological significance is fold change (FC). However this approach does not take variability into account. Therefore, the statistical t-test is extensively used for identification of DEGs by considering the variability in the gene expression in microarray data. However, t-test suffers from multiple testing problems and small sample sizes. The significance analysis of microarrays (SAM) is very popular method for detecting the DEGs. SAM explicitly addressed the issues of t-test. However, it is not robust against outliers. Nowadays, the combination of FC and p-value-based methods is proven better than FC or p-value-based methods for gene selection and ranking. There are some methods that combine the p-value and FC value such as TREAT, π-value, RP, and FCROS. In TREAT a threshold was introduced between the gaps of average in the Student's t-test [20]. A score was calculated by combining FC with a two samples statistical test p-value, which they called π-value [23]. However, the problems of multiple tests arise using these tests since these methods calculate their probabilities independently for each gene [21]. In RP the FCs are ranked in deceasing order and a product of the ranks (RP) for each gene is calculated. FCROS method has solved the problems of multiple tests [22]. They exploited variations in calculated FC levels using combinatorial pairs of control/test samples. However, the performance of these methods also deteriorated with the lower number of replicates in the biological conditions, in the presence of outliers. Therefore, in this paper, we have proposed a new hybrid robust SAM-FC approach by combining rank of FC values and rank of p-values based on SAM statistic using minimum β-divergence estimators for paired data to solve the aforementioned problems. The proposed method unifies both classical and robust estimate in a way such that in absence of outliers the MLEs are used and in presence of outliers minimum β-divergence estimators are used to obtain FC-values and p-values from SAM statistic under the null hypothesis of zero mean. The simulated and real Spike gene expression profiles analysis results showed that all eight methods (t-test, Wilcoxon, SAM, LIMMA, WAD, RP, FCROS, and Proposed) performed almost similar in the absence of outliers for both small and large sample cases. Four methods (Wilcoxon, RP, FCROS, and Proposed) perform better compared to the other four methods (t-test, SAM, LIMMA, and WAD) for large sample cases in the presence of outliers. However, the proposed method performs much better than the other seven methods (t-test, Wilcoxon, SAM, LIMMA, WAD, RP, and FCROS) for small sample case in presence of outliers. From HNSCC gene expression profiles, the proposed method detected additional two (2) DEGs that were not detected by three traditional methods (t-test, SAM, and LIMMA). Using the KEGG pathway enrichment analysis, we revealed that these two additional DEGs (NOVA1 and CYP3A4) are involved in some important pathways such as linoleic acid metabolism, drug metabolism, steroid hormone biosynthesis, and metabolic pathways. Using survival analysis through the Kaplan-Meier curve we have found that combined effect of these two genes has prognostic capability and they might be promising biomarker of HNC. In addition, the present study also revealed association of these two genes with 12 candidate drugs. Therefore, we think that these 12 candidate drugs might be potential therapeutic targets and precision therapeutic strategy in cancer. We have designed the proposed method especially for paired samples or equal sample sizes of two conditions. For unequal samples we recommend to use our previous paper, robust SAM [26]. Our future interest is to apply the proposed method in other cancer datasets.

Conclusions
In this paper we proposed a new hybrid robust SAM-FC approach to obtain a robust score using rank of FC values and rank of p-values based on SAM statistic by minimum β-divergence estimators. We used β = 0.2 for the measure of outlier detection. The proposed method uses the β-weight function to unify both classical and robust estimates to calculate the robust score. We examined the performance of proposed method in a comparison of some popular methods (t-test, Wilcoxon, SAM, LIMMA, WAD, RP, and FCROS) using both simulated and real gene expression profiles for both small-and large sample cases. We observed that the proposed method outperforms other methods for small sample case in presence of outliers and it keeps almost equal performance with other robust methods (Wilcoxon, RP, and FCROS) otherwise. Both types of data analysis results showed that the performance of the proposed method improves over the other methods for both small-and large sample cases. Therefore, our proposal is to use the proposed method instead of existing methods to obtain the better performance for identification of cancer biomarkers and candidate drugs.

Supplementary Materials:
The following are available online at http://www.mdpi.com/1010-660X/55/6/269/s1. Supplementary file 1: Figure S1: Volcano plot in presence of 10% outliers for simulated dataset. Figure S2: Predicted distribution of β-weights using data type 1 for small sample case n1 = n2 = 3 with m = 2 conditions. Figure S3: Performance evaluation using boxplot of AUC values estimated by different methods for data type 1. Figure S4: Performance evaluation using boxplot of MER values estimated by different methods for data type 1. Figure S5: Plot of FNR versus FPR estimated by different methods using data type 1 with m = 2 conditions. Figure S6: ROC curve produced by different methods using data type 1 with m = 2 conditions. Figure