Linear Programming Boosting via Column Generation

Demiriz, Ayhan; Bennett, Kristin P.; Shawe-Taylor, John

doi:10.1023/A:1012470815092

Linear Programming Boosting via Column Generation

Published: January 2002

Volume 46, pages 225–254, (2002)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Linear Programming Boosting via Column Generation

Download PDF

Ayhan Demiriz¹,
Kristin P. Bennett^2,3 &
John Shawe-Taylor⁴

3107 Accesses
262 Citations
4 Altmetric
Explore all metrics

Abstract

We examine linear program (LP) approaches to boosting and demonstrate their efficient solution using LPBoost, a column generation based simplex method. We formulate the problem as if all possible weak hypotheses had already been generated. The labels produced by the weak hypotheses become the new feature space of the problem. The boosting task becomes to construct a learning function in the label space that minimizes misclassification error and maximizes the soft margin. We prove that for classification, minimizing the 1-norm soft margin error function directly optimizes a generalization error bound. The equivalent linear program can be efficiently solved using column generation techniques developed for large-scale optimization problems. The resulting LPBoost algorithm can be used to solve any LP boosting formulation by iteratively optimizing the dual misclassification costs in a restricted LP and dynamically generating weak hypotheses to make new LP columns. We provide algorithms for soft margin classification, confidence-rated, and regression boosting problems. Unlike gradient boosting algorithms, which may converge in the limit only, LPBoost converges in a finite number of iterations to a global solution satisfying mathematically well-defined optimality conditions. The optimal solutions of LPBoost are very sparse in contrast with gradient based methods. Computationally, LPBoost is competitive in quality and computational cost to AdaBoost.

References

Anthony, M. & Bartlett, P. (1999). Learning in neural networks: Theoretical foundations. Cambridge: Cambridge University Press.
Google Scholar
Bauer, E. & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36, 105-139.
Google Scholar
Bennett, K. P. (1999). Combining support vector and mathematical programming methods for classification. In B. Schölkopf, C. Burges, & A. Smola (Eds.). Advances in kernel methods-Support vector machines (pp. 307-326). Cambridge, MA: MIT Press.
Google Scholar
Bennett, K. & Bredensteiner, E. J. (2000). Duality and geometry in svm classifiers. In P. Langley (Ed.). Proceedings of the 17th International Conference on Machine Learning (pp. 57-64). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Bennett, K. P. & Demiriz, A. (1999). Semi-supervised support vector machines. In M. Kearns & S. Solla, D. C. (Ed.). Advances in neural information processing systems 11 (pp. 368-374). Cambridge, MA: MIT Press.
Google Scholar
Bennett, K. P., Demiriz, A., & Shawe-Taylor, J. (2000). A column generation approach to boosting. In P. Langley (Ed.), Proceedings of Seventeenth International Conference on Machine Learning (ICML' 2000) (pp. 65-72). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Bennett, K. P. & Mangasarian, O. L. (1992). Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1, 23-34.
Google Scholar
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In D. Haussler (Ed.). Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory (pp. 144-152). New York: ACM Press.
Google Scholar
Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation, 11:7, 1493-1517.
Google Scholar
Cortes, C. & Vapnik, V. (1995). Support vector networks. Machine Learning, 20, 273-297.
Google Scholar
CPLEX Optimization Incorporated, Incline Village, Nevada (1994). Using the CPLEX Callabe Library.
Cristianni, N. & Shawe-Taylor, J. (2000). An introduction to support vector machines. Cambridge: Cambridge University Press.
Google Scholar
Grove, A. & Schuurmans, D. (1998). Boosting in the limit: Maximizing the margin of learned ensembles. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, AAAI-98.
Mangasarian, O. L. (1995). Linear and nonlinear separation of patterns by linear programming. Operations Research, 13, 444-452.
Google Scholar
Mangasarian, O. L. (2000). Generalized support vector machines. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 135-146). Cambridge, MA: MIT Press. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-14.ps.
Google Scholar
Murphy, P. & Aha, D. (1992). UCI repository of machine learning databases. Department of Information and Computer Science, Irvine, California: University of California.
Google Scholar
Nash, S. & Sofer, A. (1996). Linear and nonlinear programming. New York, NY: McGraw-Hill.
Google Scholar
Quinlan, J. (1996). Bagging, boosting, and C4.5. In Proceedings of the 13th National Conference on Artificial Intelligence, Menlo Park, CA: AAAI Press.
Google Scholar
Rätsch, G., Schölkopf, B., Smola, A., Mika, S., Onoda, T., & Müller, K.-R. (2000a). Robust ensemble learning. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.). Advances in large margin classifiers (pp. 207-219). Cambridge, MA: MIT Press.
Google Scholar
Rätsch, G., Schölkopf, B., Smola, A., Müller, K.-R., Onoda, T., & Mika, S. (2000b). v-arc ensemble learning in the presence of outliers. In S. A. Solla, T. Leen, & K.-R. Müller (Eds.). Advances in neural information processing systems 12. Cambridge, MA: MIT Press.
Google Scholar
Rätsch, G., Warmuth, M., Mika, S., Onoda, T., Lemm, S., & Müller, K.-R. (2000c). Barrier boosting. Technical Report.
Schapire, R., Freund, Y., Bartlett, P., & Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 26:5, 1651-1686.
Google Scholar
Schapire, R. & Singer, Y. (1998). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37:3, 297-336.
Google Scholar
Shawe-Taylor, J., Bartlett, P. L., Williamson, R. C., & Anthony, M. (1998). Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44:5, 1926-1940.
Google Scholar
Shawe-Taylor, J. & Cristianini, N. (1999). Margin distribution bounds on generalization. In Proceedings of the European Conference on Computational Learning Theory, EuroCOLT'99 (pp. 263-273).
Zhang, T. (1999). Analysis of regularized linear functions for classification problems. Technical Report RC-21572, IBM.

Download references

Author information

Authors and Affiliations

Department of Decision Sciences and Eng. Systems, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
Ayhan Demiriz
Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
Kristin P. Bennett
Microsoft Research, Redmond, WA, USA
Kristin P. Bennett
Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, TW20 0EX, UK
John Shawe-Taylor

Authors

Ayhan Demiriz
View author publications
You can also search for this author in PubMed Google Scholar
Kristin P. Bennett
View author publications
You can also search for this author in PubMed Google Scholar
John Shawe-Taylor
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Demiriz, A., Bennett, K.P. & Shawe-Taylor, J. Linear Programming Boosting via Column Generation. Machine Learning 46, 225–254 (2002). https://doi.org/10.1023/A:1012470815092

Download citation

Issue Date: January 2002
DOI: https://doi.org/10.1023/A:1012470815092

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Linear Programming Boosting via Column Generation

Abstract

Article PDF

Similar content being viewed by others

Optimization by Gradient Boosting

Cost-sensitive boosting algorithms: Do we really need them?

Using Automatic Programming to Improve Gradient Boosting for Classification

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Linear Programming Boosting via Column Generation

Abstract

Article PDF

Similar content being viewed by others

Optimization by Gradient Boosting

Cost-sensitive boosting algorithms: Do we really need them?

Using Automatic Programming to Improve Gradient Boosting for Classification

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation