Split Bregman method for large scale fused Lasso

doi:10.1016/j.csda.2010.10.021

Computational Statistics & Data Analysis

Volume 55, Issue 4, 1 April 2011, Pages 1552-1569

https://doi.org/10.1016/j.csda.2010.10.021 Get rights and content

Abstract

Ordering of regression or classification coefficients occurs in many real-world applications. Fused Lasso exploits this ordering by explicitly regularizing the differences between neighboring coefficients through an $ℓ_{1}$ norm regularizer. However, due to nonseparability and nonsmoothness of the regularization term, solving the fused Lasso problem is computationally demanding. Existing solvers can only deal with problems of small or medium size, or a special case of the fused Lasso problem in which the predictor matrix is the identity matrix. In this paper, we propose an iterative algorithm based on the split Bregman method to solve a class of large-scale fused Lasso problems, including a generalized fused Lasso and a fused Lasso support vector classifier. We derive our algorithm using an augmented Lagrangian method and prove its convergence properties. The performance of our method is tested on both artificial data and real-world applications including proteomic data from mass spectrometry and genomic data from array comparative genomic hybridization (array CGH). We demonstrate that our method is many times faster than the existing solvers, and show that it is especially efficient for large $p$ , small $n$ problems, where $p$ is the number of variables and $n$ is the number of samples.

Introduction

Regularization terms that encourage sparsity in coefficients are increasingly being used in regression and classification procedures. One widely used example is the Lasso procedure for linear regression, which minimizes the usual sum of squared errors, but additionally penalizes the $ℓ_{1}$ norm of the regression coefficients. Because of the non-differentiability of the $ℓ_{1}$ norm, the Lasso procedure tends to shrink the regression coefficients toward zero and achieves sparseness. Fast and efficient algorithms are available to solve Lasso with as many as millions of variables, which makes it an attractive choice for many large-scale real-world applications.

The fused Lasso method introduced by Tibshirani et al. (2005) is an extension of Lasso, and considers the situation where there is certain natural ordering in regression coefficients. An ordering is known for which the coefficients should be roughly piecewise constant (or the differences of neighboring coefficients should be sparse). Fused Lasso takes this natural ordering into account by placing an additional regularization term on the differences of “neighboring” coefficients. Consider the linear regression of ${(x_{i}, y_{i})}_{i = 1}^{n}$ , where $x_{i} = {(x_{i 1}, \dots, x_{i p})}^{T}$ are the predictor variables and $y_{i}$ are the responses. (We assume $x_{i j}, y_{i}$ are standardized with zero mean and unit variance across different observations.) Fused Lasso finds the coefficients of linear regression by minimizing the following loss function $Φ (β) = \frac{1}{2} \sum_{i = 1}^{n} {(y_{i} - \sum_{j = 1}^{p} x_{i j} β_{j})}^{2} + λ_{1} \sum_{i = 1}^{p} | β_{i} | + λ_{2} \sum_{i = 2}^{p} | β_{i} - β_{i - 1} |,$ where the regularization term with parameter $λ_{1}$ encourages the sparsity of the regression coefficients, while the regularization term with parameter $λ_{2}$ shrinks the differences between neighboring coefficients toward zero. As such, the method achieves both sparseness and smoothness in the regression coefficients.

Regression or classification variables with some inherent ordering occur naturally in many real-world applications. In genomics, chromosomal features such as copy number variations (CNV), epigenetic modification patterns, and genes are ordered naturally by their chromosomal locations. In proteomics, molecular fragments from mass spectrometry (MS) measurements are ordered by their mass-to-charge ratios (m/z). In dynamic gene network inference, gene regulatory networks from developmentally closer cell types are more similar than those from more distant cell types (Ahmed and Xing, 2009). Fused Lasso exploits these natural ordering, so, not surprisingly, it has found applications in these areas particularly suitable. For example, Tibshirani and Wang successfully applied fused Lasso to detect DNA copy number variations in tumor samples using array comparative genomic hybridization (CGH) data (Tibshirani and Wang, 2008). Tibshirani et al. used fused Lasso to select proteomic features that can separate tumor vs normal samples (Tibshirani et al., 2005). In addition to the application areas mentioned above, fused Lasso or the extension of it has also found applications in a number of other areas, including image denoising (Friedman et al., 2007, Gao and Zhao, 2010), social networks (Ahmed and Xing, 2009), quantitative trait network analysis (Kim et al., 2009), etc.

The loss function in (1) is strictly convex, so a global optimal solution is guaranteed to exist. However, finding the optimal solution is computationally challenging due to the nondifferentiability of $Φ (β)$ . Existing methods circumvent the nondifferentiability of $Φ (β)$ by introducing $2 p - 1$ additional variables and converting the unconstrained optimization problem into a constrained one with $6 p - 1$ linear inequality constraints. Standard convex optimization tools such as SQOPT (Philip et al., 2006) and CVX (Grant et al., 2008) can then be applied. Because of the large number of variables introduced, these methods are computationally demanding in terms of both time and space, and, in practice, have only been able to solve fused Lasso problems with small or medium sizes.

Component-wise coordinate descent has been proposed as an efficient approach for solving many $l_{1}$ regularized convex optimization problems, including Lasso, grouped Lasso, elastic nets, graphical Lasso, logistic regression, etc. (Friedman et al., 2007). However, coordinate descent cannot be applied to the fused Lasso problem because the variables in the loss function $Φ (β)$ are nonseparable due to the second regularization term, and as such, convergence is not guaranteed (Tseng, 2001).

For a special class of fused Lasso problems, named fused Lasso signal approximator (FLSA), where the predictor variables $x_{i j} = 1$ for all $i = j$ and 0 otherwise, there are algorithms available to solve it efficiently. A key observation of FLSA first noticed by Friedman et al. is that for fixed $λ_{1}$ , increasing $λ_{2}$ can only cause pairs of variables to fuse and they become unfused for any larger values of $λ_{2}$ . This observation allows Friedman et al. to develop a fusion algorithm to solve FLSA for a path of $λ_{2}$ values by keeping track of fused variables and using coordinate descent for component-wise optimization. The fusion algorithm was later extended and generalized by Hoefling (2009). However, for the fusion algorithm to work, the solution path as a function of $λ_{2}$ has to be piecewise linear, which is not true for the general fused Lasso problem (Rosset and Zhu, 2007). As such, these algorithms are not applicable to the general fused Lasso case.

In this paper, we propose a new method based on the split Bregman iteration for solving the general fused Lasso problem. Although the Bregman iteration was an old technique proposed in the sixties (Brègman, 1967, Çetin, 1989), it gained significant interest only recently after Osher and his coauthors demonstrated its high efficiency for image restoration (Osher et al., 2005, Goldstein and Osher, 2009, Cai et al., 2009a). Most recently, it has also been shown to be an efficient tool for compressed sensing (Cai et al., 2009b, Osher et al., 2010, Yin et al., 2008), matrix completion (Cai et al., 2010) and low rank matrix recovery (Candes et al., 2009). In the following, we will show that the general fused Lasso problem can be reformulated so that split Bregman iteration can be readily applied.

The rest of the paper is organized as follows. In Section 2, we derive algorithms for a class of fused Lasso problems from an augmented Lagrangian function including SBFLasso for general fused Lasso, SBFLSA for FLSA and SBFLSVM for fused Lasso support vector classifier. The convergence properties of our algorithms are also presented. We demonstrate the performance and effectiveness of the algorithm through numerical examples in Section 3, and describe additional implementation details. Algorithms described in this paper are implemented in Matlab and are freely available from the authors.

Section snippets

Split Bregman iteration for a generalized fused Lasso problem

We first describe our algorithm in a more general setting than the one described in (1). Instead of the quadratic error function, we allow the error function to be any convex function of the regression coefficients. In addition, we relax the assumption that the coefficients should be ordered along a line as in (1), and allow the ordering to be specified arbitrarily, e.g., according to a graph. For the generalized fused Lasso, we find $β$ by solving the following unconstrained optimization problem

Experimental results

Next we illustrate the efficiency of the split Bregman method for fused Lasso using time trials on artificial data as well as real-world applications from genomics and proteomics. All our algorithms were implemented in Matlab, and compiled on a windows platform. Time trials were generated on an Intel Core 2 Duo desktop PC (E7500, 2.93 GHz).

As the regression form of the fused Lasso procedures is more frequently used, we will thus focus on testing the performance of SBFLasso and SBFLSA. To

Discussion

Fused Lasso is an attractive framework for regression or classification problems with some natural ordering occurring in regression or classification coefficients. It exploits this natural ordering by explicitly regularizing the differences between neighboring coefficients through an $l_{1}$ norm regularizer. Solving the fused Lasso problem is, however, challenging because of the nondifferentiability of the objective function and the nonseparability of the variables involved in the nondifferentiable

Acknowledgements

The work was partially supported by National Science Foundation grant DBI-0846218 and by a grant from University of California.

References (37)

A.E. Çetin
Reconstruction of signals from Fourier transform samples
Signal Processing
(1989)
D. Gabay et al.
A dual algorithm for the solution of nonlinear variational problems via finite element approximation
Computers and Mathematics with Applications
(1976)
A. Ahmed et al.
Recovering time-varying networks of dependencies in social and biological studies
Proceedings of the National Academy of Sciences
(2009)
M. Bredel et al.
High-resolution genome-wide mapping of genetic alterations in human glial brain tumors
Cancer Research
(2005)
L.M. Brègman
A relaxation method of finding a common point of convex sets and its application to the solution of problems in convex programming
Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki
(1967)
J.-F. Cai et al.
A singular value thresholding algorithm for matrix completion
SIAM Journal on Optimization
(2010)
J.-F. Cai et al.
Split Bregman methods and frame based image restoration
Multiscale Modeling and Simulation
(2009)
J.-F. Cai et al.
Linearized Bregman iterations for compressed sensing
Mathematics of Computation
(2009)
Candes, E.J., Li, X., Ma, Y., Wright, J., 2009. Robust principal component analysis? Preprint....
M. Ceccarelli et al.
A scale space approach for unsupervised feature selection in mass spectra classification for ovarian cancer detection
BMC Bioinformatics
(2009)

J. Eckstein et al.

On the Douglas Rachford splitting method and the proximal point algorithm for maximal monotone operators

Mathematical Programming

(1992)

Esser, E., 2009. Applications of Lagrangian-based alternating direction methods and connections to split Bregman. CAM...

J. Friedman et al.

Pathwise coordinate optimization

The Annals of Applied Statistics

(2007)

H. Gao et al.

Multilevel bioluminescence tomography based on radiative transfer equation part 1: l1 regularization

Optics Express

(2010)

Glowinski, R., Marroco, A., 1978. Sur l’approximation, par éléments finis d’ordre un, et la résolution, par...

T. Goldstein et al.

The split Bregman method for $L 1$ -regularized problems

SIAM Journal on Imaging Sciences

(2009)

Grant, M., Boyd, S., Ye, Y., 2008. CVX: Matlab software for disciplined convex programming. Available at:...

M.R. Hestenes

Multiplier and gradient methods

Journal Optimization Theory & Applications

(1969)

Cited by (92)

Multi-block alternating direction method of multipliers for ultrahigh dimensional quantile fused regression
2024, Computational Statistics and Data Analysis
In this paper, we consider a quantile fused LASSO regression model that combines quantile regression loss with the fused LASSO penalty. Intuitively, this model offers robustness to outliers, thanks to the quantile regression, while also effectively recovering sparse and block coefficients through the fused LASSO penalty. To adapt our proposed method for ultrahigh dimensional datasets, we introduce an iterative algorithm based on the multi-block alternating direction method of multipliers (ADMM). Moreover, we demonstrate the global convergence of the algorithm and derive comparable convergence rates. Importantly, our ADMM algorithm can be easily applied to solve various existing fused LASSO models. In terms of theoretical analysis, we establish that the quantile fused LASSO can achieve near oracle properties with a practical penalty parameter, and additionally, it possesses a sure screening property under a wide class of error distributions. The numerical experimental results support our claims, showing that the quantile fused LASSO outperforms existing fused regression models in robustness, particularly under heavy-tailed distributions.
Linearized alternating direction method of multipliers for elastic-net support vector machines
2024, Pattern Recognition
In many high-dimensional datasets, the phenomenon that features are relevant often occurs. Elastic-net regularization is widely used in support vector machines (SVMs) because it can automatically perform feature selection and encourage highly correlated features to be selected or removed together. Recently, some effective algorithms have been proposed to solve the elastic-net SVMs with different convex loss functions, such as hinge, squared hinge, huberized hinge, pinball and huberized pinball. In this paper, we develop a linearized alternating direction method of multipliers (LADMM) algorithm to solve above elastic-net SVMs. In addition, our algorithm can be applied to solve some new elastic-net SVMs such as elastic-net least squares SVM. Compared with some existing algorithms, our algorithm has comparable or better performances in terms of computational cost and accuracy. Under mild conditions, we prove the convergence and derive convergence rate of our algorithm. Furthermore, numerical experiments on synthetic and real datasets demonstrate the feasibility and validity of the proposed algorithm.
Over-relaxed multi-block ADMM algorithms for doubly regularized support vector machines
2023, Neurocomputing
As a classical machine learning model, support vector machine (SVM) has attracted much attention due to its rigorous theoretical foundation and powerful discriminative performance. The doubly regularized SVM (DRSVM) is an important variant of SVM based on elastic-net regularization, which considers both the sparsity and stability of the model. To tackle the problems of explosive increases in data dimensions and data volume, the alternating direction method of multipliers (ADMM) algorithm can be used to train the DRSVM model. ADMM is an effective iterative algorithm for solving convex optimization problems by decomposing a large issue into a series of solvable subproblems, which is also well suited for distributed computing. However, lack of guaranteed convergence and slow convergence rate are two critical limitations of ADMM. In this paper, a 3-block ADMM algorithm based on the over-relaxation technique is proposed to accelerate DRSVM training, namely, the over-relaxed DRSVM (O-RDRSVM). The main strategy of the over-relaxation technique is to further append the information from the previous iteration to the next iteration to improve the convergence of ADMM. We also propose a distributed version of O-RDRSVM to handle parallel and distributed computing faster, termed DO-RDRSVM. Moreover, we develop a fast O-RDRSVM algorithm (FO-RDRSVM) and a fast DO-RDRSVM algorithm (FDO-RDRSVM), which further reduce the computational cost of O-RDRSVM and DO-RDRSVM by employing the matrix inversion lemma. The convergence analyses ensure the effectiveness of our algorithms for DRSVM training. Finally, extensive experiments on public datasets demonstrate the advantages of our algorithms in terms of convergence rate and training time while maintaining accuracy and sparsity comparable to those of previous works.
Fused lasso for feature selection using structural information
2021, Pattern Recognition
Most state-of-the-art feature selection methods tend to overlook the structural relationship between a pair of samples associated with each feature dimension, which may encapsulate useful information for refining the performance of feature selection. Moreover, they usually consider candidate feature relevancy equivalent to selected feature relevancy, and therefore, some less relevant features may be misinterpreted as salient features. To overcome these issues, we propose a new feature selection method based on graph-based feature representations and the Fused Lasso framework in this paper. Unlike state-of-the-art feature selection approaches, our method has two main advantages. First, it can accommodate structural relationship between a pair of samples through a graph-based feature representation. Second, our method can enhance the trade-off between the relevancy of each individual feature on the one hand and its redundancy between pairwise features on the other. This is achieved through the use of a Fused Lasso framework applied to features reordered on the basis of their relevance with respect to the target feature. To effectively solve the optimization problem, an iterative algorithm is developed to identify the most discriminative features. Experiments demonstrate that our proposed approach can outperform its competitors on benchmark datasets.
Modeling of pre-transplantation liver viability with spatial-temporal smooth variable selection
2021, Computer Methods and Programs in Biomedicine
Liver viability assessment plays a critical role in liver transplantation, and the accuracy of the assessment directly determines the success of the transplantation surgery and patient's outcomes. With various factors that affect liver viability, including pre-existing medical conditions of donors, the procurement process, and preservation conditions, liver viability assessment is typically subjective, invasive or inconsistent in results among different surgeons and pathologists. Motivated by these challenges, we aimed to create a non-invasive statistical model utilizing spatial-temporal infrared image (IR) data to predict the binary liver viability (acceptable/unacceptable) during the preservation.
The spatial-temporal features of liver surface temperature, monitored by IR thermography, are significantly correlated with the liver viability. A spatial-temporal smooth variable selection (STSVS) method is proposed to define the smoothness of model parameters corresponding to different liver surface regions at different times.
A case study, using porcine livers, has been performed to validate the efficacy of the STSVS method. The comparison results show that STSVS has the better overall prediction performance compared to the past state-of-the-art predictive models, including generalized linear model (GLM), support vector machine (SVM), LASSO, and Fused LASSO. Moreover, the significant predictors identified by the STSVS method indicate the importance of edges of lobes in predicting liver viability during the pre-transplantation preservation.
The proposed method has the best performance in predicting liver viability. This ‘real-time’ prediction method may increase the utilization of donors’ livers without damaging tissues and time-consuming, yet imprecise feature assessment.
Accessing dynamic functional connectivity using l<inf>0</inf>-regularized sparse-smooth inverse covariance estimation from fMRI
2021, Neurocomputing
Inferring dynamic functional connectivity (dFC) from functional magnetic resonance imaging (fMRI) is crucial to understand the time-variant functional inter-relationships among brain regions. Because of the sparse property of functional connectivity networks, sparsity-promoting dFC estimation methods, which are mainly based on $l_{1}$ -norm regularization, are gaining popularity. However, $l_{1}$ -norm regularization cannot provide the maximum sparsity solution as the most natural sparsity promoting norm, the $l_{0}$ -norm. But $l_{0}$ -norm is seldom used to infer sparse dFC because an efficient algorithm to address the non-convexity problem of $l_{0}$ -norm is lacking. In this work, we develop a new $l_{0}$ -norm regularization-based inverse covariance estimation method for estimating dFC from fMRI. This novel method employs $l_{0}$ -norm regularizations on both spatial and temporal scales to enhance the spatial sparsity and temporal smoothness of dFC estimates. To overcome the non-convexity of $l_{0}$ -norm, we further propose an effective optimization algorithm based on the coordinate descent (CD). The performance of the proposed $l_{0}$ -norm-based sparse-smooth regularization (L0-SSR) method is examined using a series of synthetic datasets concerning various types of network topology. We further apply the proposed L0-SSR method to real fMRI data recorded in block-design motor tasks from 45 participants for the exploration of task induced dFC. Results on synthetic and real-world fMRI data show that, the L0-SSR method can achieve more accurate and interpretable dFC estimates than conventional $l_{1}$ -norm-based dFC estimation methods. Hence, the proposed L0-SSR method could serve as a powerful analytical tool to infer highly complex, variable, and sparse dFC patterns.

View all citing articles on Scopus

View full text

Split Bregman method for large scale fused Lasso

Abstract

Introduction

Section snippets

Split Bregman iteration for a generalized fused Lasso problem

Experimental results

Discussion

Acknowledgements

Signal Processing

Computers and Mathematics with Applications

Recovering time-varying networks of dependencies in social and biological studies

Proceedings of the National Academy of Sciences

High-resolution genome-wide mapping of genetic alterations in human glial brain tumors

Cancer Research

A relaxation method of finding a common point of convex sets and its application to the solution of problems in convex programming

Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki

A singular value thresholding algorithm for matrix completion

SIAM Journal on Optimization

Split Bregman methods and frame based image restoration

Multiscale Modeling and Simulation

Linearized Bregman iterations for compressed sensing

Mathematics of Computation

A scale space approach for unsupervised feature selection in mass spectra classification for ovarian cancer detection

BMC Bioinformatics