Abstract
We examine linear program (LP) approaches to boosting and demonstrate their efficient solution using LPBoost, a column generation based simplex method. We formulate the problem as if all possible weak hypotheses had already been generated. The labels produced by the weak hypotheses become the new feature space of the problem. The boosting task becomes to construct a learning function in the label space that minimizes misclassification error and maximizes the soft margin. We prove that for classification, minimizing the 1-norm soft margin error function directly optimizes a generalization error bound. The equivalent linear program can be efficiently solved using column generation techniques developed for large-scale optimization problems. The resulting LPBoost algorithm can be used to solve any LP boosting formulation by iteratively optimizing the dual misclassification costs in a restricted LP and dynamically generating weak hypotheses to make new LP columns. We provide algorithms for soft margin classification, confidence-rated, and regression boosting problems. Unlike gradient boosting algorithms, which may converge in the limit only, LPBoost converges in a finite number of iterations to a global solution satisfying mathematically well-defined optimality conditions. The optimal solutions of LPBoost are very sparse in contrast with gradient based methods. Computationally, LPBoost is competitive in quality and computational cost to AdaBoost.
Article PDF
Similar content being viewed by others
References
Anthony, M. & Bartlett, P. (1999). Learning in neural networks: Theoretical foundations. Cambridge: Cambridge University Press.
Bauer, E. & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36, 105-139.
Bennett, K. P. (1999). Combining support vector and mathematical programming methods for classification. In B. Schölkopf, C. Burges, & A. Smola (Eds.). Advances in kernel methods-Support vector machines (pp. 307-326). Cambridge, MA: MIT Press.
Bennett, K. & Bredensteiner, E. J. (2000). Duality and geometry in svm classifiers. In P. Langley (Ed.). Proceedings of the 17th International Conference on Machine Learning (pp. 57-64). San Mateo, CA: Morgan Kaufmann.
Bennett, K. P. & Demiriz, A. (1999). Semi-supervised support vector machines. In M. Kearns & S. Solla, D. C. (Ed.). Advances in neural information processing systems 11 (pp. 368-374). Cambridge, MA: MIT Press.
Bennett, K. P., Demiriz, A., & Shawe-Taylor, J. (2000). A column generation approach to boosting. In P. Langley (Ed.), Proceedings of Seventeenth International Conference on Machine Learning (ICML' 2000) (pp. 65-72). San Francisco, CA: Morgan Kaufmann.
Bennett, K. P. & Mangasarian, O. L. (1992). Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1, 23-34.
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In D. Haussler (Ed.). Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory (pp. 144-152). New York: ACM Press.
Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation, 11:7, 1493-1517.
Cortes, C. & Vapnik, V. (1995). Support vector networks. Machine Learning, 20, 273-297.
CPLEX Optimization Incorporated, Incline Village, Nevada (1994). Using the CPLEX Callabe Library.
Cristianni, N. & Shawe-Taylor, J. (2000). An introduction to support vector machines. Cambridge: Cambridge University Press.
Grove, A. & Schuurmans, D. (1998). Boosting in the limit: Maximizing the margin of learned ensembles. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, AAAI-98.
Mangasarian, O. L. (1995). Linear and nonlinear separation of patterns by linear programming. Operations Research, 13, 444-452.
Mangasarian, O. L. (2000). Generalized support vector machines. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 135-146). Cambridge, MA: MIT Press. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-14.ps.
Murphy, P. & Aha, D. (1992). UCI repository of machine learning databases. Department of Information and Computer Science, Irvine, California: University of California.
Nash, S. & Sofer, A. (1996). Linear and nonlinear programming. New York, NY: McGraw-Hill.
Quinlan, J. (1996). Bagging, boosting, and C4.5. In Proceedings of the 13th National Conference on Artificial Intelligence, Menlo Park, CA: AAAI Press.
Rätsch, G., Schölkopf, B., Smola, A., Mika, S., Onoda, T., & Müller, K.-R. (2000a). Robust ensemble learning. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.). Advances in large margin classifiers (pp. 207-219). Cambridge, MA: MIT Press.
Rätsch, G., Schölkopf, B., Smola, A., Müller, K.-R., Onoda, T., & Mika, S. (2000b). v-arc ensemble learning in the presence of outliers. In S. A. Solla, T. Leen, & K.-R. Müller (Eds.). Advances in neural information processing systems 12. Cambridge, MA: MIT Press.
Rätsch, G., Warmuth, M., Mika, S., Onoda, T., Lemm, S., & Müller, K.-R. (2000c). Barrier boosting. Technical Report.
Schapire, R., Freund, Y., Bartlett, P., & Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 26:5, 1651-1686.
Schapire, R. & Singer, Y. (1998). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37:3, 297-336.
Shawe-Taylor, J., Bartlett, P. L., Williamson, R. C., & Anthony, M. (1998). Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44:5, 1926-1940.
Shawe-Taylor, J. & Cristianini, N. (1999). Margin distribution bounds on generalization. In Proceedings of the European Conference on Computational Learning Theory, EuroCOLT'99 (pp. 263-273).
Zhang, T. (1999). Analysis of regularized linear functions for classification problems. Technical Report RC-21572, IBM.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Demiriz, A., Bennett, K.P. & Shawe-Taylor, J. Linear Programming Boosting via Column Generation. Machine Learning 46, 225–254 (2002). https://doi.org/10.1023/A:1012470815092
Issue Date:
DOI: https://doi.org/10.1023/A:1012470815092