Split Bregman method for large scale fused Lasso

https://doi.org/10.1016/j.csda.2010.10.021Get rights and content

Abstract

Ordering of regression or classification coefficients occurs in many real-world applications. Fused Lasso exploits this ordering by explicitly regularizing the differences between neighboring coefficients through an 1 norm regularizer. However, due to nonseparability and nonsmoothness of the regularization term, solving the fused Lasso problem is computationally demanding. Existing solvers can only deal with problems of small or medium size, or a special case of the fused Lasso problem in which the predictor matrix is the identity matrix. In this paper, we propose an iterative algorithm based on the split Bregman method to solve a class of large-scale fused Lasso problems, including a generalized fused Lasso and a fused Lasso support vector classifier. We derive our algorithm using an augmented Lagrangian method and prove its convergence properties. The performance of our method is tested on both artificial data and real-world applications including proteomic data from mass spectrometry and genomic data from array comparative genomic hybridization (array CGH). We demonstrate that our method is many times faster than the existing solvers, and show that it is especially efficient for large p, small n problems, where p is the number of variables and n is the number of samples.

Introduction

Regularization terms that encourage sparsity in coefficients are increasingly being used in regression and classification procedures. One widely used example is the Lasso procedure for linear regression, which minimizes the usual sum of squared errors, but additionally penalizes the 1 norm of the regression coefficients. Because of the non-differentiability of the 1 norm, the Lasso procedure tends to shrink the regression coefficients toward zero and achieves sparseness. Fast and efficient algorithms are available to solve Lasso with as many as millions of variables, which makes it an attractive choice for many large-scale real-world applications.

The fused Lasso method introduced by Tibshirani et al. (2005) is an extension of Lasso, and considers the situation where there is certain natural ordering in regression coefficients. An ordering is known for which the coefficients should be roughly piecewise constant (or the differences of neighboring coefficients should be sparse). Fused Lasso takes this natural ordering into account by placing an additional regularization term on the differences of “neighboring” coefficients. Consider the linear regression of {(xi,yi)}i=1n, where xi=(xi1,,xip)T are the predictor variables and yi are the responses. (We assume xij,yi are standardized with zero mean and unit variance across different observations.) Fused Lasso finds the coefficients of linear regression by minimizing the following loss function Φ(β)=12i=1n(yij=1pxijβj)2+λ1i=1p|βi|+λ2i=2p|βiβi1|, where the regularization term with parameter λ1 encourages the sparsity of the regression coefficients, while the regularization term with parameter λ2 shrinks the differences between neighboring coefficients toward zero. As such, the method achieves both sparseness and smoothness in the regression coefficients.

Regression or classification variables with some inherent ordering occur naturally in many real-world applications. In genomics, chromosomal features such as copy number variations (CNV), epigenetic modification patterns, and genes are ordered naturally by their chromosomal locations. In proteomics, molecular fragments from mass spectrometry (MS) measurements are ordered by their mass-to-charge ratios (m/z). In dynamic gene network inference, gene regulatory networks from developmentally closer cell types are more similar than those from more distant cell types (Ahmed and Xing, 2009). Fused Lasso exploits these natural ordering, so, not surprisingly, it has found applications in these areas particularly suitable. For example, Tibshirani and Wang successfully applied fused Lasso to detect DNA copy number variations in tumor samples using array comparative genomic hybridization (CGH) data (Tibshirani and Wang, 2008). Tibshirani et al. used fused Lasso to select proteomic features that can separate tumor vs normal samples (Tibshirani et al., 2005). In addition to the application areas mentioned above, fused Lasso or the extension of it has also found applications in a number of other areas, including image denoising (Friedman et al., 2007, Gao and Zhao, 2010), social networks (Ahmed and Xing, 2009), quantitative trait network analysis (Kim et al., 2009), etc.

The loss function in (1) is strictly convex, so a global optimal solution is guaranteed to exist. However, finding the optimal solution is computationally challenging due to the nondifferentiability of Φ(β). Existing methods circumvent the nondifferentiability of Φ(β) by introducing 2p1 additional variables and converting the unconstrained optimization problem into a constrained one with 6p1 linear inequality constraints. Standard convex optimization tools such as SQOPT (Philip et al., 2006) and CVX (Grant et al., 2008) can then be applied. Because of the large number of variables introduced, these methods are computationally demanding in terms of both time and space, and, in practice, have only been able to solve fused Lasso problems with small or medium sizes.

Component-wise coordinate descent has been proposed as an efficient approach for solving many l1 regularized convex optimization problems, including Lasso, grouped Lasso, elastic nets, graphical Lasso, logistic regression, etc. (Friedman et al., 2007). However, coordinate descent cannot be applied to the fused Lasso problem because the variables in the loss function Φ(β) are nonseparable due to the second regularization term, and as such, convergence is not guaranteed (Tseng, 2001).

For a special class of fused Lasso problems, named fused Lasso signal approximator (FLSA), where the predictor variables xij=1 for all i=j and 0 otherwise, there are algorithms available to solve it efficiently. A key observation of FLSA first noticed by Friedman et al. is that for fixed λ1, increasing λ2 can only cause pairs of variables to fuse and they become unfused for any larger values of λ2. This observation allows Friedman et al. to develop a fusion algorithm to solve FLSA for a path of λ2 values by keeping track of fused variables and using coordinate descent for component-wise optimization. The fusion algorithm was later extended and generalized by Hoefling (2009). However, for the fusion algorithm to work, the solution path as a function of λ2 has to be piecewise linear, which is not true for the general fused Lasso problem (Rosset and Zhu, 2007). As such, these algorithms are not applicable to the general fused Lasso case.

In this paper, we propose a new method based on the split Bregman iteration for solving the general fused Lasso problem. Although the Bregman iteration was an old technique proposed in the sixties (Brègman, 1967, Çetin, 1989), it gained significant interest only recently after Osher and his coauthors demonstrated its high efficiency for image restoration (Osher et al., 2005, Goldstein and Osher, 2009, Cai et al., 2009a). Most recently, it has also been shown to be an efficient tool for compressed sensing (Cai et al., 2009b, Osher et al., 2010, Yin et al., 2008), matrix completion (Cai et al., 2010) and low rank matrix recovery (Candes et al., 2009). In the following, we will show that the general fused Lasso problem can be reformulated so that split Bregman iteration can be readily applied.

The rest of the paper is organized as follows. In Section 2, we derive algorithms for a class of fused Lasso problems from an augmented Lagrangian function including SBFLasso for general fused Lasso, SBFLSA for FLSA and SBFLSVM for fused Lasso support vector classifier. The convergence properties of our algorithms are also presented. We demonstrate the performance and effectiveness of the algorithm through numerical examples in Section 3, and describe additional implementation details. Algorithms described in this paper are implemented in Matlab and are freely available from the authors.

Section snippets

Split Bregman iteration for a generalized fused Lasso problem

We first describe our algorithm in a more general setting than the one described in (1). Instead of the quadratic error function, we allow the error function to be any convex function of the regression coefficients. In addition, we relax the assumption that the coefficients should be ordered along a line as in (1), and allow the ordering to be specified arbitrarily, e.g., according to a graph. For the generalized fused Lasso, we find β by solving the following unconstrained optimization problem

Experimental results

Next we illustrate the efficiency of the split Bregman method for fused Lasso using time trials on artificial data as well as real-world applications from genomics and proteomics. All our algorithms were implemented in Matlab, and compiled on a windows platform. Time trials were generated on an Intel Core 2 Duo desktop PC (E7500, 2.93 GHz).

As the regression form of the fused Lasso procedures is more frequently used, we will thus focus on testing the performance of SBFLasso and SBFLSA. To

Discussion

Fused Lasso is an attractive framework for regression or classification problems with some natural ordering occurring in regression or classification coefficients. It exploits this natural ordering by explicitly regularizing the differences between neighboring coefficients through an l1 norm regularizer. Solving the fused Lasso problem is, however, challenging because of the nondifferentiability of the objective function and the nonseparability of the variables involved in the nondifferentiable

Acknowledgements

The work was partially supported by National Science Foundation grant DBI-0846218 and by a grant from University of California.

References (37)

  • A.E. Çetin

    Reconstruction of signals from Fourier transform samples

    Signal Processing

    (1989)
  • D. Gabay et al.

    A dual algorithm for the solution of nonlinear variational problems via finite element approximation

    Computers and Mathematics with Applications

    (1976)
  • A. Ahmed et al.

    Recovering time-varying networks of dependencies in social and biological studies

    Proceedings of the National Academy of Sciences

    (2009)
  • M. Bredel et al.

    High-resolution genome-wide mapping of genetic alterations in human glial brain tumors

    Cancer Research

    (2005)
  • L.M. Brègman

    A relaxation method of finding a common point of convex sets and its application to the solution of problems in convex programming

    Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki

    (1967)
  • J.-F. Cai et al.

    A singular value thresholding algorithm for matrix completion

    SIAM Journal on Optimization

    (2010)
  • J.-F. Cai et al.

    Split Bregman methods and frame based image restoration

    Multiscale Modeling and Simulation

    (2009)
  • J.-F. Cai et al.

    Linearized Bregman iterations for compressed sensing

    Mathematics of Computation

    (2009)
  • Candes, E.J., Li, X., Ma, Y., Wright, J., 2009. Robust principal component analysis? Preprint....
  • M. Ceccarelli et al.

    A scale space approach for unsupervised feature selection in mass spectra classification for ovarian cancer detection

    BMC Bioinformatics

    (2009)
  • J. Eckstein et al.

    On the Douglas Rachford splitting method and the proximal point algorithm for maximal monotone operators

    Mathematical Programming

    (1992)
  • Esser, E., 2009. Applications of Lagrangian-based alternating direction methods and connections to split Bregman. CAM...
  • J. Friedman et al.

    Pathwise coordinate optimization

    The Annals of Applied Statistics

    (2007)
  • H. Gao et al.

    Multilevel bioluminescence tomography based on radiative transfer equation part 1: l1 regularization

    Optics Express

    (2010)
  • Glowinski, R., Marroco, A., 1978. Sur l’approximation, par éléments finis d’ordre un, et la résolution, par...
  • T. Goldstein et al.

    The split Bregman method for L1-regularized problems

    SIAM Journal on Imaging Sciences

    (2009)
  • Grant, M., Boyd, S., Ye, Y., 2008. CVX: Matlab software for disciplined convex programming. Available at:...
  • M.R. Hestenes

    Multiplier and gradient methods

    Journal Optimization Theory & Applications

    (1969)
  • Cited by (92)

    View all citing articles on Scopus
    View full text