Accelerating exact and approximate inference for (distributed) discrete optimization with GPUs

Fioretto, Ferdinando; Pontelli, Enrico; Yeoh, William; Dechter, Rina

doi:10.1007/s10601-017-9274-1

Accelerating exact and approximate inference for (distributed) discrete optimization with GPUs

Published: 18 August 2017

Volume 23, pages 1–43, (2018)
Cite this article

Constraints Aims and scope Submit manuscript

Ferdinando Fioretto ORCID: orcid.org/0000-0002-1381-6776¹,
Enrico Pontelli²,
William Yeoh³ &
…
Rina Dechter⁴

933 Accesses
10 Citations
4 Altmetric
Explore all metrics

Abstract

Discrete optimization is a central problem in artificial intelligence. The optimization of the aggregated cost of a network of cost functions arises in a variety of problems including Weighted Constraint Programs (WCSPs), Distributed Constraint Optimization (DCOP), as well as optimization in stochastic variants such as the tasks of finding the most probable explanation (MPE) in belief networks. Inference-based algorithms are powerful techniques for solving discrete optimization problems, which can be used independently or in combination with other techniques. However, their applicability is often limited by their compute intensive nature and their space requirements. This paper proposes the design and implementation of a novel inference-based technique, which exploits modern massively parallel architectures, such as those found in Graphical Processing Units (GPUs), to speed up the resolution of exact and approximated inference-based algorithms for discrete optimization. The paper studies the proposed algorithm in both centralized and distributed optimization contexts. The paper demonstrates that the use of GPUs provides significant advantages in terms of runtime and scalability, achieving up to two orders of magnitude in speedups and showing a considerable reduction in execution time (up to 345 times faster) with respect to a sequential version.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lightning search algorithm: a comprehensive survey

Article 03 November 2020

Parallelizing the dual revised simplex method

Article Open access 14 December 2017

GMO: geometric mean optimizer for solving engineering problems

Article 25 April 2023

Notes

For simplicity, we assume that tuples of variables are built according to a predefined ordering.
²For simplicity, we also use \(\theta \)to represent the tuple \(\langle \theta (x_{i_{1}}),\dots , \theta (x_{i_{h}})\rangle \)where \(\{x_{i_{1}},\dots , x_{i_{h}}\}\)is the domain of \(\theta \).
The primal graph of a DCOP is equivalent to that of the corresponding WCSP.
A warp is typically composed of 32 threads.
In modern devices, each SM allots 64KB for registers space.
Accesses to the GPU global memory are cached into cache lines of 128 Bytes, and can be fetched by all requiring threads in a warp.
Our source code is available at https://github.com/nandofioretto/GpuBE, and https://github.com/nandofioretto/GpuDBE
Downloadable from http://costfunction.org/en/benchmark/ and http://graphmod.ics.uci.edu/group/Repository
Recall that BE needs to process bucket-tables whose number of rows is in \(O(d^{w^{*}})\).
We use the Pearson product-moment correlation coefficient.
In all other experiments we used the GeForce GTX Titan, as this is the best, most affordable card at our disposal.

References

Abdennadher, S., & Schlenker, H. (1999). Nurse scheduling using constraint logic programming. In Proceedings of the conference on innovative applications of artificial intelligence (IAAI) (pp. 838–843).
Google Scholar
Allouche, D., André, I., Barbe, S., Davies, J., de Givry, S., Katsirelos, G., O’Sullivan, B., Prestwich, S.D., Schiex, T., & Traoré, S. (2014). Computational protein design as an optimization problem. Artificial Intelligence, 212, 59–79.
Article MathSciNet MATH Google Scholar
Allouche, D., de Givry, S., Nguyen, H., & Schiex, T. (2013). Toulbar2 to solve Weighted Partial max-SAT. Tech. rep. INRA.
Apt, K. (2003). Principles of constraint programming. Cambridge University Press.
Arbelaez, A., & Codognet, P. (2014). A GPU implementation of parallel constraint-based local search. In Proceedings of the euromicro international conference on parallel, distributed and network-based processing (PDP) (pp. 648–655).
Google Scholar
Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.
Article MathSciNet MATH Google Scholar
Bistaffa, F., Bomberi, N., & Farinelli, A. (2016). CUBE: a CUDA approach for bucket elimination on GPUs. In Proceedings of the European conference on artificial intelligence (ECAI), p. to appear.
Google Scholar
Bistarelli, S., Montanari, U., & Rossi, F. (1997). Semiring-based constraint satisfaction and optimization. Journal of the ACM, 44(2), 201–236.
Article MathSciNet MATH Google Scholar
Boyer, V., El Baz, D., & Elkihel, M. (2012). Solving knapsack problems on GPU. Computers & Operations Research, 39(1), 42–47.
Article MathSciNet MATH Google Scholar
Brito, I., & Meseguer, P. (2010). Improving DPOP with function filtering. In Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS) (pp. 141–158).
Google Scholar
Burke, E.K., De Causmaecker, P., Berghe, G.V., & Van Landeghem, H. (2004). The state of the art of nurse rostering. Journal of scheduling, 7(6), 441–499.
Article MathSciNet MATH Google Scholar
Campeotto, F., Dovier, A., Fioretto, F., & Pontelli, E. (2014). A GPU implementation of large neighborhood search for solving constraint optimization problems. In Proceedings of the european conference on artificial intelligence (ECAI) (pp. 189–194).
Google Scholar
Campeotto, F., Palù, A.D., Dovier, A., Fioretto, F., & Pontelli, E. (2013). A constraint solver for flexible protein model. Journal of Artificial Intelligence Research, 48, 953–1000.
MathSciNet Google Scholar
Chakroun, I., Mezmaz, M.S., Melab, N., & Bendjoudi, A. (2013). Reducing thread divergence in a GPU-accelerated branch-and-bound algorithm. Concurrency and Computation: Practice and Experience, 25(8), 1121–1136.
Article Google Scholar
Dechter, R. (1999). Bucket elimination: a unifying framework for reasoning. Artificial Intelligence, 113(1), 41–85.
Article MathSciNet MATH Google Scholar
Dechter, R. (2003). Constraint processing. San Francisco: Morgan Kaufmann Publishers Inc.
MATH Google Scholar
Dechter, R. (2013). Reasoning with probabilistic and deterministic graphical models: exact algorithms. Synthesis Lectures on Artificial Intelligence and Machine Learning, 7(3), 1–191.
Article MATH Google Scholar
Dechter, R., & Pearl, J. (1988). Network-based heuristics for constraint-satisfaction problems. Springer.
Dechter, R., & Rish, I. (2003). Mini-buckets: a general scheme for bounded inference. Journal of the ACM, 50(2), 107–153.
Article MathSciNet MATH Google Scholar
Diamos, G.F., Ashbaugh, B., Maiyuran, S., Kerr, A., Wu, H., & Yalamanchili, S. (2011). SIMD re-convergence at thread frontiers. In Proceedings of the annual IEEE/ACM international symposium on microarchitecture (pp. 477–488).
Google Scholar
Dovier, A., Formisano, A., & Pontelli, E. (2013). Autonomous agents coordination: action languages meet CLP() and Linda. Theory and Practice of Logic Programming, 13(2), 149–173.
Article MathSciNet MATH Google Scholar
Edelkamp, S., Jabbar, S., & Schrödl, S. (2004). External A*. In Advances in artificial intelligence: 27th annual German conference on AI, (KI) 2004 (pp. 226–240).
Google Scholar
Farinelli, A., Rogers, A., Petcu, A., & Jennings, N. (2008). Decentralised coordination of low-power embedded devices using the Max-Sum algorithm. In Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS) (pp. 639–646).
Google Scholar
Fioretto, F., Dovier, A., & Pontelli, E. (2015). Constrained community-based gene regulatory network inference. ACM Trans. Model. Comput. Simul., 25(2), 11.
Article MathSciNet MATH Google Scholar
Fioretto, F., Le, T., Yeoh, W., Pontelli, E., & Son, T.C. (2014). Improving DPOP with branch consistency for solving distributed constraint optimization problems. In Proceedings of the international conference on principles and practice of constraint programming (CP) (pp. 307–323).
Google Scholar
Fioretto, F., Le, T., Yeoh, W., Pontelli, E., & Son, T.C. (2015). Exploiting GPUs in solving (distributed) constraint optimization problems with dynamic programming. In Proceedings of the international conference on principles and practice of constraint programming (CP) (pp. 121– 139).
Google Scholar
Fioretto, F., Yeoh, W., & Pontelli, E. (2016). A dynamic programming-based MCMC framework for solving DCOPs with GPUs. In Proceedings of the international conference on principles and practice of constraint programming (CP) (pp. 813–831).
Chapter Google Scholar
Fioretto, F., Yeoh, W., & Pontelli, E. (2016). Multi-variable agent decomposition for DCOPs. In Proceedings of the AAAI conference on artificial intelligence (AAAI) (pp. 2480–2486).
Google Scholar
Fioretto, F., Yeoh, W., & Pontelli, E. (2017). A multiagent system approach to scheduling devices in smart homes. In Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS) (pp. 981–989).
Google Scholar
Fioretto, F., Yeoh, W., Pontelli, E., Ma, Y., & Ranade, S. (2017). A DCOP approach to the economic dispatch with demand response. In Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS) (pp. 981–989).
Google Scholar
Fishelson, M., & Geiger, D. (2002). Exact genetic linkage computations for general pedigrees. Bioinformatics, 18(suppl 1), S189–S198.
Article Google Scholar
Friedman, N., Linial, M., Nachman, I., & Pe’er, D. (2000). Using bayesian networks to analyze expression data. Journal of Computational Biology, 7(3-4), 601–620.
Article Google Scholar
Gaudreault, J., Frayret, J.M., & Pesant, G. (2009). Distributed search for supply chain coordination. Computers in Industry, 60(6), 441–451.
Article Google Scholar
Gupta, S., Yeoh, W., Pontelli, E., Jain, P., & Ranade, S.J. (2013). Modeling microgrid islanding problems as DCOPs. In North American power symposium (NAPS) (pp. 1–6): IEEE.
Hamadi, Y., Bessière, C., & Quinqueton, J. (1998). Distributed intelligent backtracking. In Proceedings of the European conference on artificial intelligence (ECAI) (pp. 219–223).
Google Scholar
Han, T.D., & Abdelrahman, T.S. (2011). Reducing branch divergence in GPU programs. In Proceedings of the fourth workshop on general purpose processing on graphics processing units (pp. 3:1–3:8). New York: ACM Press.
Google Scholar
Kask, K., Dechter, R., & Gelfand, A.E. (2012). Beem: bucket elimination with external memory. arXiv:1203.3487.
Kumar, A., Faltings, B., & Petcu, A. (2009). Distributed constraint optimization with structured resource constraints. In Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS) (pp. 923–930).
Google Scholar
Lalami, M.E., El Baz, D., & Boyer, V. (2011). Multi GPU implementation of the simplex algorithm. In Proceedings of the international conference on high performance computing and communication (HPCC), (Vol. 11 pp. 179–186).
Google Scholar
Larrosa, J. (2002). Node and arc consistency in weighted csp. In Proceedings of the AAAI conference on artificial intelligence (AAAI) (pp. 48–53).
Google Scholar
Lars, O., & Rina, D. (2017). And/or branch-and-bound on a computational grid. Journal of Artificial Intelligence Research (to appear).
Le, T., Fioretto, F., Yeoh, W., Son, T.C., & Pontelli, E. (2016). ER-DCOPS: a framework for distributed constraint optimization with uncertainty in constraint utilities. In Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS) (pp. 605– 614).
Google Scholar
Lerner, U., Parr, R., Koller, D., Biswas, G., & et al. (2000). Bayesian fault detection and diagnosis in dynamic systems. In AAAI/IAAI (pp. 531–537).
Google Scholar
Lim, H., Yuan, C., & Hansen, E.A. (2010). Scaling up map search in bayesian networks using external memory. On Probabilistic Graphical Models, 177.
Maheswaran, R., Tambe, M., Bowring, E., Pearce, J., & Varakantham, P. (2004). Taking DCOP to the real world: efficient complete solutions for distributed event scheduling. In Proceedings of the international conference on autonomous agents and multiagent systems (AAMAS) (pp. 310–317).
Google Scholar
Marinescu, R., & Dechter, R. (2009). Memory intensive and/or search for combinatorial optimization in graphical models. Artificial Intelligence, 173(16-17), 1492–1524.
Article MathSciNet MATH Google Scholar
Modi, P., Shen, W.M., Tambe, M., & Yokoo, M. (2005). ADOPT: asynchronous distributed constraint optimization with quality guarantees. Artificial Intelligence, 161 (1–2), 149–180.
Article MathSciNet MATH Google Scholar
Montanari, U. (1974). Networks of constraints: fundamental properties and applications to picture processing. Information Sciences, 7, 95–132.
Article MathSciNet MATH Google Scholar
Pawłowski, K., Kurach, K., Michalak, T., & Rahwan, T. (2104). Coalition structure generation with the graphic processor unit. Tech. Rep. CS-RR-13-07, Department of Computer Science, University of Oxford.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco: Morgan Kaufmann Publishers Inc.
MATH Google Scholar
Pesant, G. (2004). A regular language membership constraint for finite sequences of variables. In Proceedings of the international conference on principles and practice of constraint programming (CP) (pp. 482–495).
Google Scholar
Petcu, A., & Faltings, B. (2005). Approximations in distributed optimization. In Proceedings of the international conference on principles and practice of constraint programming (CP) (pp. 802–806).
Google Scholar
Petcu, A., & Faltings, B. (2005). A scalable method for multiagent constraint optimization. In Proceedings of the international joint conference on artificial intelligence (IJCAI) (pp. 1413–1420).
Google Scholar
Quimper, C.G., & Walsh, T. (2006). Global grammar constraints. In Proceedings of the international conference on principles and practice of constraint programming (CP) (pp. 751–755): Springer.
Rodrigues, L., & Magatao, L. (2007). Enhancing supply chain decisions using constraint programming: a case study. In MICAI 2007: advances in artificial intelligence, (Vol. LNCS 4827 pp. 1110–1121): Springer.
Rossi, F., van Beek, P., & Walsh, T. (eds.) (2006). Handbook of constraint programming. Elsevier.
Rust, P., Picard, G., & Ramparany, F. (2016). Using message-passing DCOP algorithms to solve energy-efficient smart environment configuration problems. In Proceedings of the international joint conference on artificial intelligence (IJCAI) (pp. 468–474).
Google Scholar
Sanders, J., & Kandrot, E. (2010). CUDA By example. An introduction to general-purpose GPU programming. Addison Wesley.
Sandholm, T. (2002). Algorithm for optimal winner determination in combinatorial auctions. Artificial Intelligence, 135(1), 1–54.
Article MathSciNet MATH Google Scholar
Schiex, T., Fargier, H., Verfaillie, G., & et al. (1995). Valued constraint satisfaction problems: Hard and easy problems. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 95, 631–639.
Google Scholar
Shapiro, L.G., & Haralick, R.M. (1981). Structural descriptions and inexact matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 3(5), 504–519.
Article Google Scholar
Silberstein, M., Schuster, A., Geiger, D., Patney, A., & Owens, J.D. (2008). Efficient computation of sum-products on gpus through software-managed cache. In Proceedings of the 22nd annual international conference on supercomputing (pp. 309–318): ACM.
Sturtevant, N.R., & Rutherford, M.J. (2013). Minimizing writes in parallel external memory search. In Proceedings of the international joint conference on artificial intelligence (IJCAI).
Google Scholar
Sultanik, E., Modi, P.J., & Regli, W.C. (2007). On modeling multiagent task scheduling as a distributed constraint optimization problem. In Proceedings of the international joint conference on artificial intelligence (IJCAI) (pp. 1531–1536).
Google Scholar
Trick, M.A. (2003). A dynamic programming approach for consistency and propagation for knapsack constraints. Annals of Operations Research, 118(1-4), 73–84.
Article MathSciNet MATH Google Scholar
Yeoh, W., Felner, A., & Koenig, S. (2010). Bnb-ADOPT: an asynchronous branch-and-bound DCOP algorithm. Journal of Artificial Intelligence Research, 38, 85–133.
MATH Google Scholar
Yeoh, W., & Yokoo, M. (2012). Distributed problem solving. AI Magazine, 33 (3), 53–65.
Article Google Scholar
Zivan, R., Yedidsion, H., Okamoto, S., Glinton, R., & Sycara, K. (2015). Distributed constraint optimization for teams of mobile sensing agents. Journal of Autonomous Agents and Multi-Agent Systems, 29(3), 495–536.
Article Google Scholar

Download references

Acknowledgements

We thank the anonymous reviewers for their comments. This research is partially supported by the National Science Foundation under grants 1345232, 1401639, 1458595, 1526842, and 1550662. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the sponsoring organizations, agencies, or the U.S. government.

Author information

Authors and Affiliations

Industrial and Operations Engineering, University of Michigan, Ann Arbor, MI, USA
Ferdinando Fioretto
Computer Science, New Mexico State University, Las Cruces, NM, USA
Enrico Pontelli
Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO, USA
William Yeoh
School of Information and Computer Science, University of California, Irvine, CA, USA
Rina Dechter

Authors

Ferdinando Fioretto
View author publications
You can also search for this author in PubMed Google Scholar
Enrico Pontelli
View author publications
You can also search for this author in PubMed Google Scholar
William Yeoh
View author publications
You can also search for this author in PubMed Google Scholar
Rina Dechter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ferdinando Fioretto.

Additional information

This journal article is an extended version of an earlier conference paper [26]. It includes (i) a parallelized design and implementation of Mini-Bucket Elimination with GPUs on WCSPs; (ii) a more detailed description of the GPU operations to ease reproducibility; (iii) a significantly more comprehensive empirical evaluation with additional WCSP benchmarks and different GPU devices.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fioretto, F., Pontelli, E., Yeoh, W. et al. Accelerating exact and approximate inference for (distributed) discrete optimization with GPUs. Constraints 23, 1–43 (2018). https://doi.org/10.1007/s10601-017-9274-1

Download citation

Published: 18 August 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s10601-017-9274-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating exact and approximate inference for (distributed) discrete optimization with GPUs

Abstract

Access this article

Similar content being viewed by others

Lightning search algorithm: a comprehensive survey

Parallelizing the dual revised simplex method

GMO: geometric mean optimizer for solving engineering problems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accelerating exact and approximate inference for (distributed) discrete optimization with GPUs

Abstract

Access this article

Similar content being viewed by others

Lightning search algorithm: a comprehensive survey

Parallelizing the dual revised simplex method

GMO: geometric mean optimizer for solving engineering problems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation