Multi-TGDR: A Regularization Method for Multi-Class Classification in Microarray Experiments

Background As microarray technology has become mature and popular, the selection and use of a small number of relevant genes for accurate classification of samples has arisen as a hot topic in the circles of biostatistics and bioinformatics. However, most of the developed algorithms lack the ability to handle multiple classes, arguably a common application. Here, we propose an extension to an existing regularization algorithm, called Threshold Gradient Descent Regularization (TGDR), to specifically tackle multi-class classification of microarray data. When there are several microarray experiments addressing the same/similar objectives, one option is to use a meta-analysis version of TGDR (Meta-TGDR), which considers the classification task as a combination of classifiers with the same structure/model while allowing the parameters to vary across studies. However, the original Meta-TGDR extension did not offer a solution to the prediction on independent samples. Here, we propose an explicit method to estimate the overall coefficients of the biomarkers selected by Meta-TGDR. This extension permits broader applicability and allows a comparison between the predictive performance of Meta-TGDR and TGDR using an independent testing set. Results Using real-world applications, we demonstrated the proposed multi-TGDR framework works well and the number of selected genes is less than the sum of all individualized binary TGDRs. Additionally, Meta-TGDR and TGDR on the batch-effect adjusted pooled data approximately provided same results. By adding Bagging procedure in each application, the stability and good predictive performance are warranted. Conclusions Compared with Meta-TGDR, TGDR is less computing time intensive, and requires no samples of all classes in each study. On the adjusted data, it has approximate same predictive performance with Meta-TGDR. Thus, it is highly recommended.


Introduction
Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in areas of biomedical research and translational medicine. Selecting a small number of relevant features (e.g., genes in transcriptomics profiles, SNPs in GWAs studies, and metabolites in metabolomics) to build a predictive model that can accurately classify samples by their diagnosis (e.g., diseased or health, different stages of one specific cancer) and prognosis (e.g., potential response to a given treatment, 5-year survival with a certain treatment) is an essential step towards personalized medicine. In bioinformatics, such a task is accomplished by a feature selection algorithm, which besides reducing over-fitting and improving classification accuracy, leads to small molecular signatures with manageable experimental verification and the potential design of cheap dedicated diagnostic and prognostic tools.
Among dozens to hundreds of proposed feature selection algorithms [1][2][3], the Threshold Gradient Decent Regularization (TGDR), proposed by Friedman and Popescu [4], stands out because of the elegant theory beneath them, its easy-to-moderate programming difficulty for a well-trained statistician and its good performance and biologically meaningful results in real-world applications. TGDR builds upon the classical gradient descent by introducing a threshold parameter that directs the paths towards a parameter with more diverse components. Ma and Huang [5] elegantly extended the TGDR to the case where expression data from several studies are combined. The proposed algorithm, the Meta Threshold Gradient Descent Regularization (Meta-TGDR), assumes that the same set of genes is selected on all studies, while allowing the b coefficients to vary across studies, in a meta-analysis fashion. In their paper, they demonstrated that a better classification performance was achieved by using Meta-TGDR rather than by using TGDR on the combined data set.
However, both TGDR as originally proposed, and Meta-TGDR frameworks do not give the explicit definition or/and format on the multi-class classification where an observation needs be categorized into more than two classes. Additionally, Meta-TGDR [5] does not offer an overall predictive rule on an independent data set (testing samples), from a study not used in classifier training/estimation. The absence of such rule prevented Meta-TGDR from the evaluation of its performance on independent testing sets, and the comparison with TGDR in terms of predictive performance. Furthermore, it limited the use of the Meta-TGDR to real, clinical practice application precluding its use in personalized medicine, where a classifier is built for use in an extended population under variable laboratory setting.
In this paper, we specifically addressed the first issue by proposing a new framework, referred as to multi-TGDR, and the second issue by proposing an equation. Lastly, the results from Meta-TGDR, and TGDR on the pooled data were compared in terms of their predictive performance.

Materials and Methods
The proposed extensions to TGDR and meta-TGDR In order to establish the nomenclature to be used through the paper and to help the reader experience we start with a brief description of the Meta-TGDR as below. The interested readers are referred to [5] for more details on both TGDR and Meta-TGDR frameworks.
Let Y m~( Y m 1 ,:::,Y m nm ) be the indicator function (i.e., 0 if in the reference class, 1 otherwise) for each study m = 1,.,M with n m subjects and X m~( X m 1 ,:::,X m nm ) the vector of n m xD matrices representing the gene expression for each subject over the same set of D genes. The likelihood function for study m can be written as: where b m 0 and b m~( b m 1 ,:::,b m D ) are the unknown intercept and expression-coefficients for each study s. These parameters will be simultaneously estimated by maximizing the overall likelihood function R(b) = R 1 (b 1 )+…+R M (b M ) with b~(b 1 ,:::,b M )where only the coefficients associated with gene expression are subject to regularization, Denote Dv as the small positive increment (e.g., 0.01) as in ordinary gradient descent searching, v k = k6Dv the index for the point along the parameter path after k steps; and b(v k ) the parameter estimate of b corresponding to this point. For a given fixed threshold 0#t#1, the Meta-TGDR algorithm iterate on the following steps: Define the meta-gradient G (v) as a D-dimensional vector whose j th -element is the sum of the gradient for each study; i.e., where the product of f and g is component-wise. 5. Steps 2-4 are iterated k times. Both t and k are tuning parameters and determined by cross validation.
In both TGDR and Meta-TGDR, the search path is determined by the gradient g (i.e., both the direction and magnitude), and the threshold vector f(v) that establish which features are selected through the tuning parameter t. Uniquely to the Meta-TGDR, the concept of meta-gradient G, defined as the sum of each study's negative gradient, was introduced to define the direction of the search path. In each iteration, features with the largest meta-gradient would be selected, with the number of features being regularized by t. If t = 0, all features are included in the model whereas a t = 1 implies that only the feature with the largest meta-gradient is selected. By this way, Meta-TGDR selects the same set of features, but their corresponding coefficients may differ across studies.

Multi-TGDR:
Extension to multi-class classification. Multiple-class classification is commonly encountered in real world, however, many proposed feature selection algorithms lack the valuable capacity of dealing with multi-class classification. In the original TGDR and Meta-TGDR framework, multi-class classification had been left untouched even though all authors claimed that such extension is very natural. Here, we propose an extension of the TGDR framework to multi-class cases.
In the multi-class scenario, the response variable Y i -representing the class membership for subject i -may take values 1,…,K, where K is the number of classes (K$3). Propositions to tackle this problem using existing binary classifiers divided into two major types: ''one-versus-the-rest'' where K binary classifiers were trained to distinguish the samples in a single class from the samples in all remaining classes, and ''one-versus-another'' where K(K-1)/2 classifiers were trained to distinguish the samples in a class from the samples in one remaining class. Many researchers [6][7][8] had demonstrated that one-versus-another schema offered better performance than one-versus-the-rest did. Therefore, we compared our proposed framework with one-versus-another schema only.
The central idea of our extension is to replace the single indicator variable Y i for each sample for a set of K-1 variables Y ik . The threshold function is then defined as the maximum along the set of local threshold functions, defined on the subspaces defined by parameters associated to each class. To our knowledge, multi-class TGDR has not been addressed; probably because it is more computationally demanding than binary TGDR.
Let Y k1 ,…,Y kn be the vector of indicators for class k across subjects; i.e., Y kj is equal to 1 if the j th subject belongs to class k and zero otherwise. This vector is defined for each class k(k = 1,…,K-1), while the K th -class serves as the reference class. In order to make our classes mutually exclusive, As before, let X 1 ,…,X n represent the gene expression values. After a simple algebraic manipulation,the log-likelihood function can be written as: b k0 's are unknown intercepts which would be not subject to regularization. b k = (b k1 ,…, b kD ) are the corresponding coefficients for expression values for the same set of D genes for the comparison between class k and reference class K. Note, the dimension of all b k is restricted to be the same, but their magnitudes differ. Hence the same number of non-zero genes is used on all classes but their estimated values are different.
Let b~b k0 f ,b k g K{1 k~1 denote the set of all parameters to be estimated in model 3, one can follow the binary TGDR procedure as detailed in [5,9,10] but introducing the following modification in the calculation of the threshold vector f(v) in step 3: Here, f ki (v) represents the threshold vector of size D for class k(k = 1,.,K-1), Then, the i th -gene specific element of threshold function f(v) will be obtained as: As in binary TGDR and Meta-TGDR, all genes are assumed to be non-informative at the initial stage. The tuning parameters, t and k jointly determine the property of the estimated coefficients (i.e., which ones would be selected, and their corresponding magnitudes). When t = 0, all coefficients are nonzero for all values of k. When t = 1, the multi-TGDR usually increases in the direction of feature (i.e., gene here) with the largest gradient in each iteration and the non-zero coefficients are few for a relatively large number of iterations. Since by definition f(v) is 1 as long as one of the elements f ki (v) is 1, it indicates that if a feature is selected because is important for, let's say, class 1 (versus reference class K), then it would appear in all the other comparisons even though its corresponding coefficients in those comparisons may not differ significantly from zero.
Here we establish a unique t tuning parameter, this assumption may be relaxed so that t can have different values for each class, which will allow different degree of regularization for different comparisons. The proposed framework is referred to as multi-TGDR.
Multi-class Meta-TGDR. The Multi-TDGR can naturally be extended to a situation with multiples studies and multiple classes. The step 3 in multi-class TGDR is combined with the concept of meta-gradient of Meta-TGDR. That means f ki is defined on the meta-gradient instead of the regular gradient, i.e., . Under this formulation, the same set of genes is selected for each class and each study but coefficients can vary across studies and classes. Obviously, with the increase on b's dimension, the iterations are more time-consuming compared to multi-class TGDR. In this paper, this framework is referred as to Meta-multi-TGDR.
Prediction of new samples using Meta-TGDR. In the Meta-TGDR, the estimated coefficients b = (b 1 ,…,b D ), corresponding to expression values for D genes selected, are different in each study. This raises a question of how to use these study-specific coefficients to perform an overall prediction in a new independent sample (''testing sample''), not previously used in the training/ estimation stage. However Ma and Huang [5] did not offer a solution to this issue, which preclude the evaluation of the performance of the Meta-TGDR (and its comparisons with TGDR) on independent ''testing'' samples. Here we extended their work by conducting a meta-regression to synthesize the results from Meta-TGDR and extrapolate the membership prediction to a sample from a new study. Under the Meta-TGDR settings, let Z ij be the estimated log odds for the j th sample in i th study using the estimated coefficientsb b i (of length D) for each study (i.e., Z ij~X i jb b i ). Z ij can be modeled as: Z ij~m0 zm 1 X 1j z:::m D X Dj ze ij i~1,:::,M j~1,:::,n i ð3Þ where m 1 ,…, m D represents the overall coefficient associated with gene i, e ij *N(0,s 2 i ) and s 2 i is the within-study variance for the study i. Specification of equation 3 can be achieved by following a 3-step procedure with the first two steps obtain estimates of s 2 i using delta method [11] and step 3 calculates the posterior probabilities using the estimated coefficients obtained in step 2.
. For each study estimate the variance of Y in natural scale by: Once we haveŝ s 2 i , the overall estimated coefficients ms can be easily obtained by a weighted least square equation (similar to the one used in fixed-effect meta-analysis).
1. Finally, the overall m i 's estimated in step 2 is used to calculate the odd-ratio and the posterior class-membership probability for a new sample.

Miscellaneous
Stabilization of the selected genes using Bootstrap aggregating. In order to improve the stability and classification accuracy of multi-class TGDR, we applied Bootstrap aggregating (Bagging) to our classifier [12]. Given a training set of size n, bagging generates m new training sets, each of size n, by sampling subjects from the original training set with replacement. Then m multi-class TDGRs are conducted using the above m bootstrap samples and combined by voting. Bagging helps to protect overfitting which usually exists in the classification setting.
Evaluation of predictive performance. The performance of a classifier is measured using traditional performance metrics over the training samples (training: # of misclassification/sample size on training set; and 5 fold-cross-validation misclassified error rate: the misclassified rate on the cross-validation data) and over the test samples (the predictive error, which is more heavily weighted than the misclassification errors when we evaluate the predictive performance of an algorithm since those samples were not used to construct the classifiers, thus less subject to overfitting). Since the membership probabilities for each sample can easily be obtained from multi-TGDR/TGDR algorithm, we also used the generalized Brier score (GBS) proposed by Yeung et al [13], a generalization of the Brier Score to a multi-class classification problem, to evaluate the performance of these algorithms (i.e., multi-TGDR, TGDR and its pairwise coupling, and Meta-TGDR). Under the K class setting, where Y ik are indicator functions of class k (k = 1,…,K), letp p ik denote the predicted probability such that Y ik = 1. For easier interpretation and comparison of GBS score across different classification settings, we normalized the GBS by the sample size n as in [14]

3.
i.e., . By taking into consideration the magnitudes of predicted probabilities, GBS, can establish difference in performance of classifiers with same overall predictive error. The smaller the GBS value, which after normalization takes values in [0,1], the better a classifier performs.

The
Experimental Data and preprocessing procedures. Psoriasis: Open-access data from 3 published studies [15][16][17] available under GEO accession numbers GSE14905, GSE13355, and GSE30999, respectively were used, including samples from Lesions (LS) and adjacent Non-Lesional (NL) skin form psoriasis patients and Normal skin from healthy patients. Details of these studies -using Hgu133plus2 Affymetrix chips-are given in [18].
Lung cancer: The lung cancer data sets included GSE10245, GSE18842, and GSE2109, all studies were performed on Affy HGU 133plus2 chips and publicly available on the GEO repository.
Pre-processing procedures: The raw Affymetrix data (CEL files) of both lung cancer and psoriasis data sets were downloaded from GEO repository and expression values were obtained using FRMA algorithm [19] [20]. To pool data from different studies together and to address the batch effects from different experiments, COMBAT algorithm [20] was used to adjust on the combined expression values for these two combined data sets. For the lung cancer data, moderated F/t-tests (limma package) were conducted to identify differentially expressed genes (DEGs) with cutoffs for FDR and fold change as 0.05 and 2, respectively. When there are multiple probesets representing the same gene, the one with the largest F-value was chosen. The resulting 949 unique genes were fed into the downstream analysis. Note, for the TGDR algorithm there is no limit on the number of genes fed into the algorithm. However, it is common practice in high-throughput experiments (e.g., Ma and Huang [5,9]) to rule out ''noninformative'' genes using a filtering procedure before the classification. By doing so, a large amount of computing time can be saved with only partial set of genes put in classifiers; but with no or least loss on the potential biomarkers since almost all genes which have high probability to be biomarkers pass the filtering. Relevantly, Tritchler et al. [21] explored the effect of filtering on some downstream analysis (i.e., clustering and network analysis) in their work and discussed in details the advantages of filtering as a preprocessing step. For psoriasis data, the filtering steps taken (including SD, ICC and DEGs using meta-analysis method) were used by us previously and described in details there [18]. Similar to the lung cancer data, conducting those filtering steps is mainly for the purpose of saving the computing time. 2301 unique genes passed the filtering were fed into multi-TGDR and Meta-TGDR algorithms.
Statistical language and packages. The statistical analysis was carried out in the R language version 2.15 (www.r-project. org), and packages were from the Bioconductor project (www. bioconductor.org). R code for multi-TGDR is available upon request.

Simulation studies
In this section, we use two simulated examples to study the empirical performance of multi-TGDR.

Example 1
In the first simulation, 1006n iid standard normal (mean = 0, variance = 1) random variables (i.e., X 1 ,…,X 100 , those are vectors of length n, n is the sample size), and n class membership outcome variables (Y 1 ,…,Y n ) taking the values of 1-3 were simulated. The logit function for class 2 and 3, having class 1 as reference, was calculated through the following relationship: where the logit for class 2 depends only on features X 1 X 2 X 3 and class 3 logit depends on features 1,2 and 4. According to this model, 50 data sets were generated and analyzed by the proposed multi-TGDR framework. Results for this simulation, summarized in Table 1A shows that almost 100% of times the relevant features were selected by multi-TGDR framework. As criticized by Wang et al [22], lack of parsimony is an obvious disadvantage of TGDR algorithms, a shortcoming inherited by the multi-TGDR. How- ever, the introduction of the Bagging procedure improves upon parsimony.

Example 2
To explore the effect of the correlations among features (i.e., independent variables) may have on multi-TGDR, we set the simulations as in the previous example but assumed the following correlations among features: cor (X 1 , X 5 ) = cor (X 3 , X 7 ) = 0.8 and cor (X 2 , X 6 ) = cor (X 4 , X 8 ) = 20.8. Table 1B presents the results for this simulation. When compared with the uncorrelated scenario of example 1, the size of final selected feature is marginally larger while the predictive errors are almost the same. Nevertheless, multi-TGDR always selected the relevant features successfully and has good predictive performance. Bagging procedure improves upon both parsimony and predictive performance. Thus, it is highly recommended to combine bagging with any TGDR algorithm although bagging is very computing-time intensive.
In high-throughput experiments, it is common practice to use low-level analysis to eliminate non-informative features before embarking on more complex, time-consuming modeling. To evaluate if the pre-processing steps used in our data (see methods section) can erroneously filter out the informative features, we reran both simulations using only the 20% top features (ranked by limma's moderated t-tests). The results show that the exclusion of non-informative features deemed by a pre-processing filtering would not degrade the predictive performance of the final models. In terms of implementation on real-world applications, this filtering step eases the computational burden of downstream analysis (i.e., TGDR and Bagging) by reducing the initial dimensions of 50 k to a few thousands; a reasonable size even for a PC.

Applications on microarray expression data
Using the real-world applications, the appropriateness and accuracy of the proposed multi-TGDR was evaluated. We also compared the performance of Meta-TGDR and TGDR under a priori batch-adjustment.

Lung cancer
About 80% of lung cancers (LC), the leading cause of cancerrelated death throughout the world, are classified as non-small cell lung carcinoma (NSCLC) with Adenocarcinoma (AC) and squamous cell carcinoma (SCC) the two major subtypes of NSCLC. SCC is characterized as a poorly differentiated tumor subtype that develops in the proximal airways and is strongly associated with cigarette smoking. In contrast, AC usually arises in the peripheral airways and is more commonly observed in nonsmokers and women. Mutations have been identified in AC and not in SCC, suggestion different mechanism of progression and treatment response.
For LC data, we randomly divided it into 4 subsets with roughly equal sizes and used 3 fold of them as the training set (n = 109) and the remained 1-fold as the test set (n = 36). Using multi-TGDR algorithm, 67 biomarkers were identified with 0% training error and a predictive error of 20.2% in a 5-fold cross-validation (CV). The comparison between pair-wise TGDRs and the multi-class TGDR is summarized in Table 2, and it shows that multi-TGDR over-perform the pair-wise strategy in all performance statistics. In terms of computing time, a single run of binary TGDR (including the determination of tuning parameters using cross-validation and model estimation) took on average 19 seconds on a MacBook equipped with double-core 1.8 GHz processors and 8GB RAM. Thus, for the LC data, the total computing time for the pair-wise TGDRs was about 2 minutes and only 77 seconds for the multi-TGDR framework, saving about 36% of the computing time.
After applying Bagging (N B = 100) to the LC data, we found that all genes in the 67-gene signature produced by multi-TGDR appear in the classifier with more than 5% frequency and 19 of them has bragging frequencies (BF) larger than 40% (Table 3). CYP24A1 is the gene most frequently selected (77%) followed by PNLDC1 (67%). By reducing the multi-TDGR signature to those genes with BF .40%, the performance showed slightly improvement by a predictive error reduction of 2.78% on the test set.
We concluded that bagging procedure discarded the random noises produced by a single run of TGDR. Furthermore, the calculation of membership probabilities in multi-class TGDR is more straightforward compared to the pair-wise coupling. Given it is intrinsically challenging to derive meaningful diagnostic signatures from high-throughput experiments in complicated problems like this, one major objective of presenting this data set is to use it as a benchmark for the development of more suitable classifiers on lung cancer subtypes and stages.

Psoriasis
Psoriasis vulgaris is a common chronic inflammatory skin disease of varying severity, characterized by red scaly plaques. Publicly data from 3 published studies [15][16][17] were used, including samples from Lesional (LS) and adjacent Non-Lesional (NL) skin of psoriasis patients and Normal skin from healthy patients.
Here, we used the psoriasis data to investigate the effect of the Batch/study adjustment in the performance of TGDR and Meta-TGDR, as well as to further evaluate multi-TGDR. Again, we randomly divided the whole data into 5 subsets with roughly equal sizes and used 4 fold of them as the training set (n = 360) and the left one fold as one test set (n = 89). By doing so, we can evaluate the validity of the proposed equation for the overall estimates since both the study-specific and overall estimates are available for the samples in this test set.
Here, we first present the performance of the binary classifiers followed by the multi-class problem. The binary classifiers (LS vs Normal, LS vs NL and NL vs Normal) will allow us 1) to assess the effect of batch adjustment on the performance of TGDR and Meta-TGDR and 2) to evaluate the validity of the method proposed here to allow prediction of independent data set in the Meta-TDGR framework.
Binary Classification Problems in Psoriasis Data. The results for all 3 comparisons were presented in Table 4A. The positive effect of batch adjustment on Meta-TDGR's performance is striking and consistent across all comparisons and datasets. When the data were adjusted for batch effect before classification, TGDR and Meta-TGDR had identical miss-classification rates on training and testing samples in all 3 comparisons and TGDR slightly outperformed in terms of GBS.
LS versus normal: Using TGDR on batch-adjusted expression from LS and Normal skin samples, we identified 30 biomarkers with a 0% and 0.86% training and CV-5 error, respectively. Bragging (N B = 100) frequencies were above 5% for all 30 genes. By considering a series of cutoff values for the frequency (5%-50%), a 30% for BF was chosen as it minimized the GBS and misclassification rates with the smallest number of non-zero genes leading to a final model with 18 genes (Table 4). Meta-TGDR (after batch adjustment) identified 22 biomarkers all with BF.5%. Cut-off for BF was set at 40% and among the 10 selected genes (see Table 5), 6 (p,0.0001) overlapped with the 18 genes in the TGDR bagging classifier.
LS versus NL: TGDR signature for LS vs NL classification included 35 biomarkers with a training error of 0% and 1.48% in 5-CV. Applying Bagging procedure (N B = 100) the final model (with BF .30%) included 22 genes (see Table 6). Meta-TGDR on the adjusted data identified a 25 genes signature, all with BF .5% and 16 of them above the selected cut-off of 40% for BF (See Table 6). There is still an impressive overlapping (n = 11) between these 16 genes and the 22 genes chosen by TGDR bagging model (Fisher's test: p,0.0001).
Although Meta-TGDR is more parsimonious, k (the number of steps\iterations) is always dramatically bigger than in TGDR for the same value of the tuning parameter t in both algorithms. One possible explanation is that by allowing different coefficients for a specific gene across studies, the direction of updating path in individual study may differ, probably leading to a cancel-out among one another. Therefore, the gradients might descend at higher speed in TGDR than meta-gradients in Meta-TGDR and the maximized likelihood value might be reached within fewer steps. Since separate gradient matrixes were calculated for each study in meta-TGDR and the tuning parameter k is always bigger (gradient matrix and threshold function must be calculated in each iteration), the computing time of meta-TGDR is expected to be substantially longer than that of TGDR (e.g., for LS vs NL comparison, meta-TGDR used 9 minutes and 42 seconds to determine on the tuning parameters and to estimate the coefficients while TGDR used 2 minutes and 46 seconds).
Psoriasis 3-class classification. Here we evaluate the performance of the multi-TGDR versus the classifier build using all pairwise binary TGDRs. The performance of the Meta-multi-TGDR for adjusted and unadjusted data is also presented.
Using multi-TGDR on the training set, we identified 60 genes with 0% training error 1.67% error in 5 fold CV. Interestingly; the number of selected genes by multi-TGDR is approximately the sum of all individualized binary TGDRs. The size went down to 39 genes (Table 6) when bagging frequency being larger than the selected cut-off (40%). Again, disposal of the low-BF genes did not hurt the predictive performance. On the contrary, it improves the predictive performance in terms of GBS for both training and testing samples.
The classifier built by combining the 3 pairwise binary TGDR had the same in-training performance as multi-TGDR using 76 genes, with 44 of them being part of the multi-TGDR classifier (p,0.0001). The inconsistency between two algorithms is partially because local optimal points in individualized binary TGDRs cannot warrant the global optimality in the multi-TGDR. With this data set, multi-TGDR and pairwise binary TGDR had similar performance while multi-TGDR was more parsimonious. Additionally, total computing time was about 8 minutes for pair-wise strategy and 4.5 minutes for the multi-TGDR framework. Multi-TGDR with Bagging provided the best performance (see Table 7 and Figure 1). This shed some evidence on appropriation and accuracy of the multi-TGDR framework. Certainly, further evaluation using independent test sets is needed.
Surprisingly, the performance of Meta-multi-TGDR, where coefficients for both classes and studies are included is not impressive. This may partially due to the fact that Meta-multi-TGDR intends to find consistent-expressed genes across all classes and studies (one possible reason why the number of non-zero genes in multi-Meta-TGDR is the smallest). Based on the analyses conducted here, we illustrated that TGDR on the adjusted data has a similar or better performance compared to Meta-TGDR, thus we think Meta-multi-TGDR, with its increase complexity and computing burden, is quite unnecessary. Nonetheless, the Metamulti-TGDR greatly improved after batch adjustment reducing training error from 18.33% to 7.11% and predictive error (on test set) from 27% to 6.74%, demonstrating that the adjustment of batch effect is imperative.

Discussion
When several microarray studies address the same or similar objectives, it is statistically more robust to carry out the analysis by pooling all studies together. To identify molecular signatures that discriminate among different disease status or stages on the pooled data, one can either apply TGDR to the batch-effect adjusted expression values for all samples, or use Meta-TGDR to select consistently informative genes and obtain the overall estimates using the procedure we proposed in the paper.
Using real-world applications, we showed that TGDR and Meta-TGDR have approximately equal predictive performance when the data has been adjusted for batch-effect. Compared to the latter method, TGDR on the adjusted data saves computing time, and do not require that all classes must be represented in each study. However, the stability of Meta-TGDR is usually better than TGDR as shown by the analyses of psoriasis data, and future work must be done to improve more on stability of TGDR. Nonetheless, applying Meta-TGDR on the unadjusted data had worse predictive performance compared to the analyses on the adjusted data. This verified our conjecture that Meta-TGDR aims mainly at selecting consistent genes across studies, with few to no capacity to adjust for a large batch-effect.
Additionally, we assembled our analyses with the Bagging procedure [12]. The benefits of Bagging including improved selection stability; more classification accuracy; and protection against over-fitting are clearly illustrated here.
Since multi-TGDR is an extension to binary TGDR, whose performance had been proved to be equal or superior to many other classification methods in the original papers [9], we did not compare the multi-TGDR with other classification methods and rather focus on the important issues addressed here: making Meta-TGDR useful in practice by offering a solution to the prediction of independent datasets, comparing TGDR and Meta-TGDR performance after batch adjustment and extending both algorithms to the multiclass setting. Future work should involve a comprehensive comparison of multi-TGDR' performance with other multi-class classification methods and the identification of data types most suitable for multi-TGDR. We currently devote ourselves to such extensive and laborious work. Additionally, as the numbers of classes in the applications presented here are not big (4 in LC and 3 in psoriasis data, respectively), future work will include some applications of the multi-TGDR framework with a large number of classes, where the performance of pair-coupling has been reported to decrease dramatically, to see if a single likelihood-based classifier like multi-TGDR can be a rescue.