Supplementary Information for “The joint lasso: High-dimensional regression for group structured data”

In order to demonstrate further characteristics of our method for information sharing across subgroups, we perform a pathway enrichment analysis. The aim is to determine the number of significantly enriched pathways in the ADNI dataset based on the SNP effect sizes in the multiple regression on the outcome (cognitive decline). We will not recap the penalized regression framework, which is described in the main text, but we will describe how p-values for pathway enrichment are derived based on the ranking of effect sizes. First, a list of pathways in Homo sapiens, the proteins active in them, and the genes coding for these proteins is retrieved from the Kegg database [1]. In order to calculate enrichment, we require a ranking of the genes in the pathway, which means that we need to associate SNPs with genes, which is done using the bitr function of the clusterProfiler package [4]. For simplicity, if there is more than one SNP associated with a given gene, we retain only the SNP with the largest effect size. The genes are then ranked according to absolute effect size of the associated SNPs; genes with no associated SNPs are assigned the lowest rank. Finally, we use the wilcoxGST function in limma [3] to run an enrichment test based on the Wilcoxon rank-sum [2]. Only pathways with p-value ≤ 0.05 after Benjamini-Hochberg correction for multiple testing are considered significant.


Results
Although it would be possible to visualize the significantly enriched pathways, we choose not to do so for two reasons: 1) the sample size of the ADNI dataset, while large enough for the purposes of demonstrating improvements in prediction, is not large enough to allow for any reliable validation of our results on a held-out dataset, creating the risk of potentially spurious results for any given patway; 2) the procedure for choosing a subset of SNPs for our analysis creates the risk of biased results.While this does not affect the comparison of prediction methods, it does create a danger of over-interpreting results that may not generalize to other datasets.
Instead, we use the pathway analysis to investigate the characteristics of our method under increasing fusion of subgroup parameters, as regularized by the γ hyperparameter.Figure 1 shows the evaluation of number of significantly enriched pathways as γ increases.The λ parameter is determined by crossvalidation as before.We note that for small γ (minimal information sharing), few pathways are identified and they differ per group.As γ increases, additional common pathways are identified.For very large γ, the number of significant pathways decreases again; this is a consequence of the tradeoff between γ and λ; overly large γ values will increase the sparsity of β k vectors of linear parameters for each group k.This is an interesting property of our method, but does not adversely impact estimation as long as γ is chosen by cross-validation.

Effect of Subgroup Weighting
The model proposed in the paper has the following criterion:  .Bar plots show number of significant pathways for a given value of γ, while the black line shows the number of pathways that are deemed significant in all subgroups (scaled by the number of subgroups).

B = arg min
It would be possible to define a very similar model which excludes the 1 n k terms, i.e.: The difference is that the first model weights the sum of squared errors by the number of samples in each subgroup (in effect using the mean sum of squared errors), while the second model does not.The unweighted approach can lead to sub-optimal performance in pathological cases where the group sizes differ drastically.To demonstrate this, we performed a small simulation study (Figure 2) with three subgroups, where two share the same coefficient vector β 1 and a third divergent subgroup has coefficient vector β 2 .We set λ = 0 and γ = 1 for the unweighted model in a deliberate misspecification to force fusion of the coefficients.In the subgroup weighting model, we set γ = k/n, which ensures comparable fusion penalization between the two models.
The simulation study shows that estimation of the coefficients is unaffected by subgroup weighting as long as the sample size of the divergent subgroup is similar to the size of the two other subgroups.However, when the divergent subgroup is much larger than the other two subgroups, subgroup weighting leads to an improvement in the estimation error.
Figure 2: Simulation of the effect of subgroup weighting in a pathological dataset consisting of three subgroups; groups 1 and 2 of sample size n = 25 and divergent group 3 of varying sample size.Data for group 1 and 2 are generated from a linear regression model with coefficient vector β 1 of size p = 25, while data for the divergent group is generated from a different linear regression model with coefficient vector β 2 .In a deliberate misspecification, we set λ and γ such that fusion of the coefficients is enforced.As the sample size of the divergent group increases, the RMSE of the fused lasso estimate of the coefficients should decrease.We observe that under the model weighted by the subgroup size, this happens more quickly than under the unweighted model once the imbalance among the subgroup sizes becomes sufficiently large.

Figure 1 :
Figure1: Number of significant patways in each of the four subgroups of the ADNI dataset (AD -Alzheimer's disease, EMCI -early mild cognitive impairment, LMCI -late mild cognitive impairment, CN -cognitive normal).Bar plots show number of significant pathways for a given value of γ, while the black line shows the number of pathways that are deemed significant in all subgroups (scaled by the number of subgroups).