Decision Programming for Multi-Stage Optimization under Uncertainty

Influence diagrams are widely employed to represent multi-stage decision problems in which each decision is a choice from a discrete set of alternatives, uncertain chance events have discrete outcomes, and prior decisions may influence the probability distributions of uncertain chance events endogenously. In this paper, we develop the Decision Programming framework which extends the applicability of influence diagrams by developing mixed-integer linear programming formulations for solving such problems in the presence of many kinds of constraints. In particular, Decision Programming makes it possible to (i) solve problems in which earlier decisions cannot necessarily be recalled later, for instance, when decisions are taken by agents who cannot communicate with each other; (ii) accommodate a broad range of deterministic and chance constraints, including those based on resource consumption, logical dependencies or risk measures such as Conditional Value-at-Risk; and (iii) determine all non-dominated decision strategies in problems which involve multiple value objectives. In project portfolio selection problems, Decision Programming allows scenario probabilities to depend endogenously on project decisions and can thus be viewed as a generalization of Contingent Portfolio Programming.


Introduction
Influence diagrams, in their many variants (see, e.g., Bielza et al., 2011;Diehl & Haimes, 2004;Díez et al., 2018;Howard & Matheson, 1984, are widely employed to represent decision problems whose consequences depend on discrete sets of both uncertain chance events and decisions which are taken in multiple stages. Specifically, such decisions and chance events are represented by decision and chance nodes in an acyclic graph whose arcs indicate (i) what information is available to the decision maker (DM) and (ii) how realizations of chance events depend on earlier decisions and chance events. The value node represents consequences which are associated with the DM's decisions and the realization of chance events.
The DM's risk preferences are typically modeled with a utility function over the set of consequences. The optimal solution to the influence diagram is the decision strategy that, at each decision node, assigns a decision alternative to every possible state of information at this node so that the combination of these decisions maximizes the DM's expected utility. If the diagram fulfills the 'no forgetting' assumption, meaning that earlier decisions can be recalled when making later ones (see, e.g., Lauritzen & Nilsson, 2001;Jorgensen et al., 2014), this optimal strategy can be computed with well-established techniques, for example by carrying out local transformations such as arc reversals and node removals (Shachter, 1986(Shachter, , 1988, or by formulating the equivalent decision tree representation and solving it with dynamic programming (Tatman & Shachter, 1990).
While this assumption often holds, there are problems in which it does not, for example, in distributed decision making by agents such as military patrols who may not be able to communicate with each other (for examples, see, e.g., Zhang et al., 1994). Examples include problems of adversarial risk analysis in which agents may not be able to observe what decisions the other agents have taken (Rios Insua et al., 2009;Roponen et al., 2020). Another important context is the risk management of safety-critical systems that must not fail even if information cannot be synchronized due to disruptions or communication delays. More generally, dynamic programming is a restrictive solution approach, because the optimal strategy within a branch that unfolds from a given decision node cannot depend on decisions in other branches of the decision tree. Thus, the objective function cannot include risk measures such as Value-at-Risk, which reflects the full variability of consequences across the entire decision tree. Project portfolio selection problems, too, give rise to analogous dependencies, because the consumption of shared resources, for example, implies that optimal decisions for one project are contingent on those for others. Consequently, the optimal strategy for a given project cannot be determined without considering strategies for the other projects (Gustafsson & Salo, 2005).
In this paper, we develop the Decision Programming modeling framework which (i) uses influence diagrams to represent the structure of discrete multi-stage decision problems under uncertainty, including those that cannot be solved with dynamic programming techniques, (ii) extends the problem formulation by allowing for the specification of both deterministic (e.g., logical dependencies, costs arising at one or more nodes) and chance (i.e., probabilistic) constraints, and (iii) converts the resulting problem representations into mixed-integer linear programming (MILP) problems. In particular, this framework is generic enough for solving limited memory influence diagrams (LIMID) in which the 'no forgetting' assumption may not hold. Decision programming also makes it possible to compute all non-dominated strategies in problems with multiple objectives, represented by multiple value nodes. Importantly, all these modeling features can be incorporated into corresponding MILP problems that can be solved efficiently with available software tools (for a survey, see, e.g., Fourer, 2017), as evidenced by our computational results which demonstrate that problems of considerable size can be solved to optimality.
Our contribution is relevant to Stochastic Programming (Birge & Louveaux, 2011) as it provides a general framework for problems in which decisions are made over several stages and realizations of uncertain events are observed between pairs of successive stages. In the first stage, an initial decision is selected, and subsequent recourse decisions are selected after observing the realizations of uncertain earlier events. We distinguish between endogenous and exogenous uncertainties based on whether earlier decisions can influence conditional probability distributions. Both types of uncertainties can be accommodated in Decision Programming by converting influence diagrams and adjoining constraints into multi-stage stochastic integer programming (MSSIP) problems that can be efficiently solved using off-the-shelf MILP solvers. That is, the diagram is first converted into a sequence of decision and chance nodes. This sequence is then employed when transforming the deterministic equivalent MILP formulation of the MSSIP.
The rest of this paper is structured as follows. Section 2 discusses earlier approaches. Section 3 develops the Decision Programming framework. Section 4 presents illustrative examples. Section 5 develops modeling approaches for dealing with risk preferences, chance constraints and multiple objectives. Section 6 gives results on computational performance. Section 7 concludes and provides directions for further development of the framework.

Earlier approaches
Influence diagrams were initially developed in the 1970's (Olmsted, 1983;Howard & Matheson, 1984 to represent informational and probabilistic dependencies between decisions and uncertain chance events which, taken together, determine consequences for the DM. If the 'no-forgetting' assumption holds so that earlier decisions are known when making later ones, and the aim is to maximize expected utility at the value node, these diagrams can be solved with well-established techniques, for instance, by forming an equivalent decision tree that can be solved through dynamic programming (Tatman & Shachter, 1990); or by removing decision and chance nodes from the diagram one-by-one, possibly after arc reversals (see, e.g., Shachter, 1986;Smith et al., 1993;Howard & Matheson, 2005).
As visual tools for problem representation, influence diagrams differ from decision trees in suggesting that the problem has a symmetric structure in which the sets of alternative decisions as well as realizations of chance events do not depend on preceding decision and chance nodes. Still, asymmetric problems can be modeled with influence diagrams through the appropriate definition of node states and their dependencies (see, e.g., (Smith et al., 1993)). Mathematically, the mapping of input parameters (i.e., probabilities, decisions) to outputs (i.e., expected utilities) in influence diagrams is a piecewise multilinear function (Borgonovo & Tonoli, 2014), which fact underpins developments in this paper. For an account of the evolution of influence diagrams, see Bielza et al. (2011).
Problems in which earlier decisions cannot be recalled give rise to LIMIDs which are computationally challenging because optimal strategies cannot be determined through an equally straightforward series of local computations. Zhang et al. (1994) discuss these and other kinds of influence diagrams. Lauritzen & Nilsson (2001) develop an iterative policy updating approach for LIMIDs by solving a series of expected utility maximization problems by message passing in a junction tree derived from the influence diagram. Hovgaard & Brinker (2016) describe an application of LIMIDs to structural damage protection. Mauá & Cozman (2016) study the computational performance of k-neighborhood local search algorithms and propose approximate algorithms. Further optimization formulations for solving influence diagrams in junction trees are presented by Parmentier et al. (2020).
However, these approaches based on local computations and iterative message passing schemes are illequipped for problems involving constraints that span across the entire problem (e.g., due to logical interdependencies, limited budgets, bounds on risk levels) and whose fulfilment cannot be determined locally. For example, the DM may seek to maximize the expected net present value (NPV) subject to the requirement that the expectation in the lower tail of the NPV distribution is not too low (i.e., Conditional Value-at-Risk, which is a coherent risk measure; Artzner et al., 1999). These local approaches also encounter difficulties in problems in which several objectives associated with their respective multiple value nodes have to be explicitly addressed (see, e.g., Diehl & Haimes, 2004).
In portfolio decision analysis (Salo et al., 2011), influence diagrams help portray the overall structure of probabilistic and informational dependencies, but they cannot handle constraints arising from limited budgets or logical dependencies between alternatives. For project selection problems, Contingent Portfolio Programming (Gustafsson & Salo, 2005) employs MILP to determine optimal project management strategies when the projects' cash flows are contingent on scenarios whose probabilities cannot depend endogenously on project decisions. Vilkkumaa et al. (2018) extend this approach to single-stage selection problems in which scenario probabilities can depend endogenously on project decisions. Liesiö & Salo (2012) derive decision recommendations for single-stage project selection problems with one objective and possibly incomplete utility and probability information. Yet, none of these earlier approaches are equipped to handle problems in which there is a combination of endogenous uncertainties, several decision stages, and multiple objectives.
Several papers use stochastic programming as the underpinning framework for modeling multi-stage problems under uncertainty. Nevertheless, the literature on endogenous uncertainty in stochastic programming is still sparse, because the existing models depart from domains in which well performing solution techniques are available, most prominently convex programming in general, and linear programming in particular.
Most of the stochastic programming literature focuses on problems in which decisions can influence the information structure, in particular the timing of unveiling uncertainties, as opposed to the actual probability distributions associated with uncertain events. Goel & Grossmann (2006) develop a stochastic programming formulation for multi-stage problems for the timing of oil well exploitation, which is assumed not to influence the uncertain amount of recoverable oil. Building on developments in Goel & Grossmann (2004), they propose a unified framework and solution methods to handle problems in which the decisions influence the time of observing uncertainties. Gupta & Grossmann (2011 present specialized solution methods for oil and gas field development. Colvin & Maravelias (2008) propose a stochastic programming model for novel product development in pharmaceutical research, further extended by Colvin & Maravelias (2009). In this context, the timing of when uncertainties are resolved is influenced endogenously by the decisions on how to perform clinical trials which, however, leads to computational challenges (Colvin & Maravelias, 2010). Solak et al. (2010) address R&D project portfolio optimization under endogenous uncertainty, acknowledging that the inclusion of decision dependent uncertainties significantly degrades tractability. To tackle this issue, they propose a sophisticated solution method, exploiting the formulation devised specifically for the problem. Apap & Grossmann (2017) provide a comprehensive recent literature overview and propose an approach for problems with a decisiondependent information structure.
Problems where decisions can (also) affect the probability distributions of uncertain events have been much less explored. The predominant strategy has been to remove decision dependent probabilities using appropriate transformations in the probability measure, as described by Rubinstein & Shapiro (1993) (see also Pflug, 2012), or in the probability distribution itself (cf. Dupacová). In their overview of this scarce literature, Hellemo et al. (2018) propose a taxonomy of distinct classes for stochastic programs with endogenous uncertainties and possible formulation approaches. They also report computational experiments to highlight how challenging these problems are for state-of-the-art optimization solvers.
In fact, multi-stage optimization problems under uncertainty can involve decision dependent probabilities, parameters, and/or information structures (Hellemo et al., 2018). The Decision Programming framework developed here is general enough to encompass all these variants, on condition that each chance event has a finite number of possible realizations and decisions correspond to choices from a finite set of discrete alternatives; this is the case in the majority of the aforementioned approaches and the following development.

Influence diagram representation of the decision problem
Multi-stage decision problems under uncertainty can be modeled as acyclic networks G = (N, A) whose nodes N = C ∪ D ∪ V consist of chance nodes C, decision nodes D, and value nodes V . Chance nodes C represent uncertain events associated with random variables; decision nodes D correspond to decisions among discrete alternatives; and value nodes V represent consequences that result from the realizations of random variables at chance nodes and the decisions made at decision nodes.
Dependencies between nodes are represented by arcs A = {(i, j) | i, j ∈ N }. A path of length k is a sequence of nodes (i 1 , i 2 , . . . , i k ) such that (i l , i l+1 ) ∈ A for all l = 1, . . . , k − 1. The information set of a node j ∈ N , defined as I(j) = {i ∈ N | (i, j) ∈ A}, consists of the direct predecessors of j from which there is an arc to j.
Since the network G is acyclic, the nodes N can be indexed consecutively with integers 1, 2, . . . , |N | (where | · | denotes the number of elements in a set) so that for each node j ∈ N , the indices of the nodes in its information set I(j) are smaller than j (i.e., i < j for all i ∈ I(j)).
We denote the number of chance nodes by n C = |C| and the number of decision nodes by n D = |D|. These n = n C +n D chance and decision nodes are indexed as C ∪D = {1, 2, . . . , n}, while the n V = |V | = |N |−n value nodes are indexed as n + 1, . . . , n + n V . For now, we assume that there is a single value node in the influence diagram (the extension to multiple value nodes is covered in Section 5.4). Consequences at this value node are determined by the decisions and the realization of chance events which do not depend on consequences. Thus, there are no arcs (i, j) ∈ A such that i ∈ V and j ∈ C ∪ D.
Each chance and decision node j ∈ C ∪ D has a finite set S j of discrete states. The occurrence of states depend on their possible information states s I(j) ∈ S I(j) = i∈I(j) S i , defined as combinations of states of all nodes in the information set I(j). For each chance node j ∈ C, these states correspond to realizations of the random variable X j , which depends probabilistically on the states s i of the nodes i ∈ I(j) in the information set of j. For a decision node j ∈ D, each state s j ∈ S j corresponds to a decision that is made based on the information state s I(j) . For brevity, we use X j , j = 1, . . . , n, to denote both random variables which are associated with chance nodes j ∈ C and decision variables which are associated with decision nodes j ∈ D.
Specifically, if j ∈ C is a chance node whose information state is s I(j) , then state s j ∈ S j occurs with the conditional probability where X I(j) = s I(j) means that the states of the variables X i in the information set i ∈ I(j) are the same as specified by the information state s I(j) . For each decision node j ∈ D, a local decision strategy Z j : S I(j) → S j is a function that maps each information state in S I(j) to a decision in S j . A (global) decision strategy Z is a set of local decision strategies which contains one local strategy Z j for each decision node j ∈ D. The set of all decision strategies is denoted by Z.

Paths
A path s = (s 1 , s 2 , . . . , s n ) of length n is a sequence of states s i ∈ S i of all chance and decision nodes, i.e., i ∈ C ∪ D for all i = 1, . . . , n. The set S of all paths of length n is Paths of length k < n are sequences s 1:k = (s 1 , s 2 , . . . , s k ) such that s i ∈ S i , i ≤ k. If s 1:k ∈ S 1:k , k < n, and s k+1 ∈ S k+1 , the state s k+1 can be appended to s 1:k to form the path s 1:k+1 = (s 1 , s 2 , . . . , s k , s k+1 ) ∈ S 1:k+1 .
If s 1:k ∈ S 1:k , k ≤ n, and I {1, . . . , k}, then s I is a subsequence of s 1:k for the nodes i ∈ I. Thus, s I is a sequence of length |I| which contains the same states as s 1:k for nodes i ∈ I.
A decision strategy Z ∈ Z is compatible with the path s ∈ S if and only if Z j (s I(j) ) = s j , ∀ Z j ∈ Z, j ∈ D.
Thus, at each decision node j ∈ D, Z j ∈ Z maps the information state s I(j) contained in s to the corresponding decision s j in s. In this case, the probability of path s is P(s | Z) = i∈C P(X i = s i | X I(i) = s I(i) ). On the other hand, if Z is not compatible with s, it contains some local decision strategy Z j , j ∈ D, such that the information state s I(j) contained in s is mapped to a decision which differs from the state s j of node j in s. As a result, choosing Z means that s cannot occur and therefore P(s | Z) = 0.
More generally, for a given decision strategy Z ∈ Z, the probability of a path s ∈ S can be expressed recursively as a function of the conditional probabilities (1) and local decision strategies so that where the indicator function I( · ) is defined so that

Characterizing path probabilities using linear inequalities
A given decision strategy Z ∈ Z assigns probabilities to all paths s 1:k ∈ S 1:k , k = 1, . . . , n, in accordance with (3). However, this expression does not suggest efficient ways of computing these probabilities. One could introduce binary variables taking values of the indicator functions I Z j (s I(j) ) = s j , for all j ∈ D, whose multiplication would lead to a mixed-integer nonlinear programming (MINLP) problem which could be converted into a equivalent MILP. An early version of the Decision Programming approach relied on this strategy, which, despite being feasible, led to a MILP formulation with a weak linear programming relaxation, and was therefore deemed too inefficient for off-the-shelf solver performance.
Alternatively, we characterize the probabilities of paths s 1:k ∈ S 1:k , k = 1, . . . , n, through sets of linear inequalities. Towards this end, local decision strategies Z j , j ∈ D, are modelled through corresponding binary variables z(s j | s I(j) ) ∈ {0, 1} such that z(s j | s I(j) ) = 1 if and only if Z j maps the information state s I(j) to the decision s j ∈ S j , i.e., Furthermore, the mutual exclusivity of the decisions is ensured through the constraints which ensure that exactly one decision s j ∈ S j is chosen for every information state s I(j) ∈ S I(j) .
For the given decision strategy Z ∈ Z, the corresponding probability π(s) of any path s ∈ S can be derived recursively as follows. To initialize the recursive process, let π 0 (s) = 1. Suppose that the probabilities π i (s) = P(X 1:k−1 = s 1:k−1 | Z) are known for nodes i ≤ k − 1 and consider the next node k ≤ n. If k ∈ C is a chance node, let where the first term on the right side of (7) is given by (1). If k ∈ D is a decision node, let This assignment corresponds to the inequalities Theorem 1 states that the path probabilities implied by any strategy Z can be calculated through the assignment (5)-(8). Importantly, the equivalence between the assignments (5)-(8) and the inequalities (9)- (12) implies that the strategy which maximizes the expectation of a real-valued function over paths can be determined through optimization by employing these inequalities as constraints on the decision variables z(s k |s I(k) ), k ∈ D, s k ∈ S k , s I(k) ∈ S I(k) .
Theorem 1. Let Z ∈ Z be a decision strategy and choose a path s ∈ S. If π k (s), k = 1, . . . , n, and z(s j | s I(j) ), ∀j ∈ D, satisfy the constraints (5)-(8), then In particular, π(s) def = π n (s) is the probability of the path s for the decision strategy Z.
Proof. See Appendix A.

Maximization of expected utility
We assume that at the value node v ∈ V , the function Y v : S I(v) → C maps combinations of states of the nodes in its information set I(v) to a set of consequences C and that there exists a real-valued utility function U : C → R that is defined over C. Then, the utility associated with the path s ∈ S can be precomputed as Because the path probabilities π(s), s ∈ S, for the selected decision strategy Z ∈ Z are given by Theorem 1, it follows that the decision strategy which maximizes the DM's expected utility is the solution to the optimization problem in Corollary 1.
Corollary 1. The expected utility is maximized by the decision strategy Z ∈ Z which solves the optimization subject to constraints (5)- (7) and (9) In particular, the objective function and constraints in Corollary 1 are linear in the decision variables z(s j | s I(j) ) and the corresponding path probabilities π k (s). This is a MILP problem for which the optimal decision strategy can be computed with off-the-shelf MILP solvers.

Improving the MILP formulation
To simplify the formulation in Corollary 1, we note that the objective function (15) involves path probabilities π(s) only for full paths s ∈ S = S 1:n of length n. Also, the probability π(s) of each path s ∈ S depends on two separable components. First, for each path s ∈ S, the conditional probabilities (1) of the states s j for chance nodes j ∈ C can be multiplied to obtain the following upper bound for π(s): Second, for a given decision strategy Z ∈ Z, this upper bound p(s) is the actual probability of s if and only if Z is compatible with s. That is, if z(s j | s I(j) ) = 1, ∀j ∈ D, the inequalities (9)-(12) imply π j (s) = π j−1 (s) for each j ∈ D. This result can be used to solve the equations (7)-(8) recursively starting from π 0 (s) = 1 to the last node n for which π n (s) = p(s) in (16). Conversely, if the decision strategy Z is not compatible with s, inequalities (9)-(10) imply that π n (s) ≤ π j (s) = 0 for some j ∈ D. Thus, because π(s) = π n (s) = p(s) if and only if z(s j | s I(j ) = 1, ∀ j ∈ D, the optimization problem in Corollary 1 can be reformulated as subject to where the constraints (18) ensure that some decision s j ∈ S j is made at each decision node j ∈ D for every information state set s I(j) ∈ S I(j) (as stated in (6)). Constraints (19) bound the probabilities of paths s ∈ S.
Constraints (20) ensure that only those paths which are compatible with the decision strategy can have positive probabilities. Constraints (21) ensure that the probabilities of paths with negative utility U(s) cannot become smaller than their upper bounds p(s) for paths s such that z(s j | s I(j) ) = 1, j ∈ D. However, because utility functions are unique to positive affine transformations and the value node has a finite number of i∈I(v) |S i | information states, one can choose a utility function with non-negative utilities (i.e., which allows for omitting constraints (21). For clarity, we note that in constraints (20)- (21), the states s j , s I(j) are taken from the selected path s ∈ S. Finally, constraints (22) enforce the domain of all binary variables z(s j | s I(j) ).

Valid equalities
Next, we describe valid equalities to strengthen the problem formulation (17)-(22) so that it can be solved more efficiently. These equalities are derived from the problem structure and can help compute the optimal decision strategies, as shown in Section 6. However, adding these equalities directly as additional constraints may slow down the overall solution process especially for larger problems, as many of them can be derived from the problem structure.
Alternatively, one can include these valid equalities during the solution process as "lazy constraints" that can be used by the MILP solver to prune nodes of the branch-and-bound tree more efficiently. One can also add them during the solution process in a cutting plane fashion as "user cuts" for a subset of nodes in the tree based on some criterion (or multiple criteria), for example, if the upper bound has not improved enough within some time interval. Such lazy constraints and user cuts are standard features in off-the-shelf MILP solvers.
Specifically, the first set of equalities, referred to as probability cuts, exploit the fact that for any decision strategy Z ∈ Z, the sum of the probabilities π(s) must equal one so that s∈S π(s) = 1. These equalities are valid for any problem that can be formulated as (17)-(22). As an example of how a probability cut works as a lazy constraint, suppose that the optimal (fractional) solution of a node in the branch and bound tree does not satisfy the probability cut. Then, the problem at that node will be re-optimized after adding the probability cut, and if the new optimal cost is smaller than the current best primal bound, the node can be discarded.
The second set of equalities can be used in problems whose structure makes it possible to determine in advance for a given decision strategy Z ∈ Z how many paths s ∈ S are active so that π(s) > 0. For example, if the number of active paths in any solution is n s , we can define a valid equality s∈S π(s)/p(s) = n s where p(s) in (16) is the upper bound for π(s). This approach can be generalized to asymmetric problems in which the number of active paths varies for different decision strategies. In such cases, several equalities can to be added to cover different possibilities in how the number of active paths depends on the states of decision or chance nodes. Such information, derived from an analysis of symmetries in the problem structure (see, e.g., Bielza et al., 2011), serve to improve computational efficiency.

Decision Programming without the No-Forgetting Assumption
As an example of a problem in which the no-forgetting assumption does not hold, assume that there is an uncertain load L on a built structure which can be fortified through actions A 1 and A 2 to mitigate the risk of a failure F of this structure. These two decisions are informed by measurement reports R 1 and R 2 of the load L.
The decision as to whether action A 1 should be implemented is informed by the report R 1 only and, similarly, decision A 2 is based on the report R 2 alone. In particular, the decision as to whether the fortification decision A 1 will be or has been installed is not known when making the decision A 2 (and conversely for A 2 ). The utility at the target node T depends on whether or not the structure fails and how much the fortification actions cost.
This problem structure also represents a situation where the reports are generated by sensors which inform safety controls (e.g., valves) that must activated instantaneously to prevent potential disruptions in a safetycritical system such as a nuclear plant (see, e.g., (Mancuso et al., 2019)). In particular, the safety must be ensured even if failures of communication equipment prevent the sensors from sharing information with a centralised server or other sensors.
Just as in the example in Figure 12 from Zhang et al. (1994), this problem structure is challenging in that the optimal strategy at one decision node depends on that at of the other. In particular, the no-forgetting assumption does not hold, because there is no sequence of chance nodes C = {L, R 1 , R 2 , F } and decision nodes depends on the failure and the cost of implementing the fortification actions). By using node labels to indicate sets of states for corresponding nodes, the paths are sequences s = (l, r 1 , r 2 , a 1 , (16) are p(l, r 1 , r 2 , a 1 , a 2 , f ) = P(l)P(r 1 | l)P(r 2 | l)P(f | l, a 1 , a 2 ), and the decision strategies are defined by Z = (Z 1 , Z 2 ) such that Z i : R i → A i .
Using this notation, the optimal fortification strategy can be obtained by solving the equations (18)-(22), which in this example become maximize Z∈Z (l,r 1 ,r 2 ,a 1 ,a 2 ,f ) 0 ≤ π(l, r 1 , r 2 , a 1 , a 2 , f ) ≤ p(l, r 1 , r 2 , a 1 , a 2 , f ), ∀ (l, r 1 , r 2 , a 1 , a 2 , f ) ∈ S π(l, r 1 , r 2 , a 1 , a 2 , f ) ≤ z(a i | r i ), ∀ (l, r 1 , r 2 , a 1 , a 2 , f ) ∈ S π(l, r 1 , r 2 , a 1 , a 2 , f ) ≥ p(l, r 1 , r 2 , a 1 , where Y T (a 1 , a 2 , f ) gives the consequences associated with the failure state f and the actions a 1 and a 2 . If all the decision and chance nodes have binary states, then there are altogether 8 decision variables (4 per each fortification decision) and 2 6 = 64 paths, resulting in 4 equality constraints and 128 inequality constraints (in the second inequality constraint, the states a i , r i are implied by the selected path and third inequality constraints can be omitted by normalizing the utility function so that it attains positive values only).

Decision Programming as an Extension of Contingent Portfolio Programming
Contingent Portfolio Programming (Gustafsson & Salo, 2005) is a methodology for determining optimal decision strategies for multi-period investment projects whose cash flows depend on (i) uncertain states of nature and (ii) project management decisions. The aim is to maximize the expected resource position at the terminal period, subject to relevant resource and consistency constraints. Risk preferences can be accounted for either by formulating risk constraints or by introducing risk measures into the objective function. CPP problems of realistic size can be tackled with off-the-shelf MILP solvers. However, a limitation of CPP is that the probabilities of the states of nature cannot depend on project decisions. Yet, such dependencies arise for instance when the projects influence market and regulatory uncertainties (for a case study, see Vilkkumaa et al., 2018).
Decision Programming generalizes CPP models so that the probabilities of the states of nature in the scenario tree can depend endogenously on project decisions. We illustrate this by extending the example in Gustafsson & Salo (2005) with two projects A and B of which one or both can be started in period 0. If a project is started, it can be continued in period 1, in which case it gives in period 2 a payoff that depends on the two chance events C 1 and C 2 which correspond to uncertain upward and downward movements in the CPP scenario tree in periods 1 and 2, respectively. Yet, a computational challenge with the approach of embedding project decisions in paths is that that the paths tend to become prohibitively long, because each project increases the path length by the number of decisions for the project. This challenge can be addressed by introducing decision nodes whose states represent the aggregate portfolio-level performance achieved through the selected projects. The chance events defining the scenario tree can then be made contingent on these performance levels, while decisions concerning portfolio-level performance can be employed as constraints on the project-specific selection decisions.

Specifically, the decision nodes
For instance, assume that the first-stage decisions specify which technology development projects will be started to generate patent-based intellectual property (P) for a platform. This intellectual property contributes subject to some uncertainties to the technical competitiveness (T) of the platform. In the second stage, it is possible to carry out application (A) development projects which, when completed, yield cash flows that depend on the market share of the platform. This market share (M) depends on the competitiveness of the platform and the number of developed applications. The aim is to maximize the cash flows from application projects less the cost of technology and application development projects.
The structure of this problem can be modeled by introducing decision nodes D P , D A for the development of patents and applications, respectively so that the states of these nodes correspond to continuous and contiguous over ranges for the number of patents and applications that can be generated. In other words, these states represent discretizations of these ranges, indexed with i = 1, . . . , |D P | and k = 1, . . . , |D A |.
The technical competitiveness of the platform and its market size are represented by chance nodes C T and C M whose realizations are denoted by the states c T j , c M l (where the states of these chance nodes are indicated by indices j = 1, . . . , |C T | and l = 1, . . . , |C M |). The dependencies at these chance nodes can be characterized by estimating the probabilities p(c T j | d P i ) and p(c M l | c T j , d A k ) for the relevant combinations of states. At the aggregate portfolio level, the decisions consist of (i) choosing the number of patents to be generated by selecting the interval d P i = [d P i , d P i ) such that z(d P i ) = 1 and, based on this decision and the ensuing state of technical competitiveness c T j , (ii) deciding the number of applications to be developed by choosing the state Thus, the corresponding Decision Programming paths are sequences s = (d P i , c T j , d A k , c M l ) whose probabilities π(s) are characterized by the inequalities (18)-(22) in Section 3 so that The portfolio-level decisions d P i and d A k can be linked to the selection of technology and application projects as follows. Assume there are n T technology projects such that the project t ∈ {1, . . . , n T } requires the investment I t and produces O t patents as its output, as well as and n A application projects such that project a ∈ {1, . . . , n A } requires the investment of I a and generates O a applications. If completed, the application project a provides the cash flow V (a | c M l ) when the market size is c M l . Using binary decision variables x T (t) and x A (a | c T j ) to indicate which technology and application projects are selected (note that the competitiveness of the technology is known when starting application projects), the realized cash flow U(s) for the path s The selection of technology projects is indicated by binary variables x T i (t) which indicate whether or not the technology project t is selected when the number of patents belongs to the interval d P To ensure that the selection of the technology and application projects are aligned with decisions z(d P i ) and where M is a large constant and the terms for ε = 1 2 min t,a {O t , O a } eliminate the possibility of making selections which lead to the upper bound of the interval. By constraint (28), the number of patents is in the selected

while (29) ensures that the number of applications is in the interval
By construction, the terms in the cash flow expression (24) can be written as . Using these and noting (23), each term π(s)U(s) in (15) can be written as By constraints (25)- (27), the project selections x T i (t) and x A k (a|d P i , c T j ) can affect the objective function only for the selected intervals z(d P i ) = z(d A k | d P i , c T j ) = 1 while the constraints (28)-(29) imply that the portfolios selections are indeed aligned with these selections. Thus, we have an MILP problem involving both the decision variables z(d P i ) and z(d A k | d P i , T j ), as well as the selection of technology and application projects Importantly, in this problem structure the number of paths |D P | × |C T | × |D A | × |C M | stays the same regardless of the size of the sets from which technology and application development projects are selected. Nor does the number of constraints grow with number of project candidates. This helps solve much bigger problems while still accounting for the endogenous impact that the selected projects have on technological competitiveness and market share. In effect, optimization becomes indispensable when there are more candidate projects to be selected from. For instance, from a set of 30 candidate projects one can build 2 30 ≈ 1, 074 × 10 9 portfolios, wherefore explicit enumeration is no longer a viable approach. A further benefit of this layered structure is that the elicitation of conditional probabilities for the chance nodes can be largely separated from the consideration of individual technology and application projects. Still, the compatibility of the two layers is ensured by the optimization model which can readily accommodate many kinds of constraints, such as those representing budgetary constraints, risk preferences or logical dependencies.

Extensions to modeling chance constraints and multiple value nodes
We next extend the formulation in Section 3 through approaches for modeling risk measures and chance constraints, as well as for computing non-dominated strategies in problems which have multiple objectives represented by different value nodes.
Apart from the use of nonlinear utility functions U ( · ) in (14), risk preferences can be accounted through risk measures ρ that map decision strategies to non-negative real numbers and can be introduced as additional terms into the objective function or employed as constraints. In the following, we assume that, at the value nodes v ∈ V , the aim is to maximize the consequences C(s) = Y v (s I(v) ) ∈ C, which are assessed using real numbers.

Absolute and lower-semi absolute deviation
Let t ∈ R be a given target level for consequences and define the non-negative deviation variables By construction, ∆ + t (s) (respectively ∆ − t (s)) measures how much the consequence C(s) is above (below) the target level t. The deviations (30) can be precomputed for the information states S I(v) at the value node v.
The expected downside risk (EDR) of a decision strategy Z ∈ Z relative to the target level t is If t is chosen to be the expected value of consequences E[C | Z] = s∈S π(s)Y v (s I(v) ) for the decision strategy Z, the corresponding non-negative deviation (decision) variables ∆ + to capture the deviations from E[C]. The absolute deviation (AD) and the lower semi-absolute deviation (LSAD) are then given by These measures can be used to augment the objective function through an additional additive term which penalizes for risk. For example, if the aim is to maximize expected consequences while accounting for risks through (lower semi-)absolute deviation, one possibility is to formulate the objective function as max is a weighting coefficient that reflects the DM's risk aversion. Alternatively, as an example of using risk measures to constrain feasible decision strategies, assume that the consequences are defined as profits reported in kUSD. Then the constraint ρ AD (Z) ≤ 10 would rule out any strategy Z ∈ Z for which profits can be expected to differ more than 10 kUSD from the level of its expected profits E[C | Z].

Chance constraints and Value-at-Risk
Probabilistic chance constraints can be modeled as linear inequalities on the path probabilities π(s) which depend linearly on the decision variables. For example, to assess whether the consequences C(s) meet or exceed the stated target level t ∈ R, we define the parameters If the outcome is required to reach the target level t with a probability that is higher than or equal to a stated threshold level p t , we can impose the constraint which is linear in the path probabilities π(s). The terms Λ t (s), ∀ s ∈ S, need to be defined only for s I(v) ∈ S I(v) .
In the present context where the probability distributions over consequences are discrete, the Value-at-Risk (VaR) risk measure for the strategy Z can be defined as where F −1 Z is the inverse function of the cumulative probability distribution F Z : C → [0, 1] which is defined as F Z (t) = s | C(s)≤t π(s).
Because the probability distribution over the set of paths is discrete, the definition (36) means that consequences which are less than or equal to VaR α (Z) can occur with a probability greater than α (Rockafellar & Uryasev, 2002). This is the case if VaR α (Z) coincides with a consequence where the cumulative probability distribution function jumps from a level below α to one that exceeds α so that P {s | C(s) < VaR α (Z)} < α < P {s | C(s) ≤ VaR α (Z)} .
Constraints such as (35) can be employed to introduce VaR requirements. That is, if the probability α > 0 is associated with the corresponding VaR level t α VaR , then the path probabilities of feasible decision strategies must satisfy the constraint This approach can be generalized to introduce chance constraints on the states of nodes k ∈ C ∪ D as well. For instance, assume that the state at node k needs to be in some setS k ⊂ S k with a probability which is less than or equal top k . This requirement can be represented by the constraint s∈S π(s)ΛS k (s) ≥p k where ΛS k (s) = 1 if s k ∈S k and ΛS k (s) = 0 otherwise. Thus, for example, for a decision node k ∈ D, one could require that the probability of having to employ extraordinary decisions, as represented by the statesS k , does not exceed a pre-specified probabilityp k .

Conditional Value-at-Risk
For a given probability α > 0 and a decision strategy Z ∈ Z, the Conditional Value-at-Risk (CVaR) is the expected level of consequences in the event that the realized consequence is in the α ∈ (0, 1] lower tail of the probability distribution. Contributions to this expectation come from (i) paths s ∈ S < VaRα(Z) = {s ∈ S | C(s) < VaR α (Z)} which lead to consequences strictly less than VaR α (Z) and (ii) paths s ∈ S = VaRα(Z) = {s ∈ S | C(s) = VaR α (Z)} which lead to the consequence VaR α (Z). The share of the probability of these latter paths that needs to be accounted in the computation of the CVaR level is the difference α − P({s | C(s) < VaR α (Z)}) = α − s∈S < VaRα(Z) π(s). Thus, as in Liesiö & Salo (2012), we define the risk measure CVaR α (Z) as By Proposition 1, the VaR and CVaR levels for a given probability level α > 0 and decision strategy Z ∈ Z can be determined by solving the optimization problem (40) Proposition 1. Choose α ∈ (0, 1] and let π(s), ∀ s ∈ S, be the path probabilities for a decision strategy Z ∈ Z.
Then the optimization problem has a solution such that the optimum value η * = VaR α (Z) and CVaR α (Z) = 1 α s∈S ρ(s)C(s).

Proof. See Appendix A.
An inspection of the proof of Proposition 1 shows that for any feasible solution to the constraints (40)-(50), the expression s∈S ρ(s)C(s)/α gives the correct CVaR α (Z) risk measure for Z. Thus, if the expectation of consequences in the lower α-tail of the probability distribution over consequences is required to be greater than or equal to the lower bound t α CVaR , this requirement can be enforced by adding the constraints (40)-(50) and s∈S ρ(s)C(s) ≥ αt α CVaR to (18)-(22). One approach to address trade-offs between the maximization of conditional expectations for different levels of α is to treat these as different objectives with respective weighting coefficients. Thus, combining the unconditional expectation with the selected α ∈ (0, 1) for CVaR leads to the problem where the parameter w ∈ (0, 1) represents trade-offs between (i) the overall expectation in the first term of (51) and (ii) the expectation in the lower α-tail as expressed by the second term. Specifically, the ratio 1 − w w indicates how much of the overall expectation the DM is willing to give up in return for improving the CVaR level by one unit, regardless of the overall expectation.

Multiple value nodes and objectives
The consideration of CVaR levels together with the maximization of expected consequences is an example of the more general case with multiple objectives n V > 1. In this case, attention can be focused on non-dominated strategies Z ∈ Z N D such that there is no other feasible strategy Z ∈ Z F whose expectation is equal to or higher than that of Z at each value node and strictly higher for at least one value node, i.e., where E[C v | Z] = s∈S π(s)C v (s) denotes the expectation at value node v ∈ V and the inequality is strict for at least one value node v ∈ V .
Because the strategies are choices from a discrete set of alternatives, this is a discrete multi-objective optimization problem (MOO) in which the objectives correspond to the maximization of expectations for different value nodes. Thus, it can be solved with algorithms for this problem class. Holzmann & Smith (2018) provide an extensive review and propose an algorithm based on augmented Tchebychev norm, in which choices about the initial step size need to be made.
The weighting approach in (51) or, more generally, the maximization of the expression v∈V w can be employed to generate non-dominated strategies. However, a shortcoming of the weighting approach is that it does not necessarily generate all non-dominated strategies even if all non-negative weighting coefficients w v ≥ 0, ∀ v ∈ V such that v∈V w v = 1, are employed. This will be the case if a non-dominated strategy Z ∈ Z N D is dominated by a weighted linear combination of other non-dominated strategies Z 1 , . . . , Z k ∈ Z N D so that for some selection of positive weights This notwithstanding, the weighting approach can be adapted to generate all non-dominated strategies.
First, if Z ∈ Z N D is a non-dominated strategy, then it can be excluded when computing further candidates for non-dominated strategies by introducing the linear constraint where z (s i | s I(i) ), s i ∈ S i , s I(i) ∈ S I(i) are the decision variables for Z ∈ Z * . In (53), the left side for strategy Z will be greater than one if and only if Z differs from Z .
Second, if Z ∈ Z N D , then further candidates for non-dominated strategies must not be dominated by Z . A necessary condition for this can be formulated by defining the binary variables λ + where M is a large constant (e.g., slightly greater than c * = max s∈S C(s)). Now, consider any solution to so that the values of the variables λ − Z ,v (Z) = 1, λ + Z ,v (Z) = 0 can be switched to λ − Z ,v (Z) = 0, λ + Z ,v (Z) = 1, in which case the constraints (54)-(55) are still satisfied. Thus, for any strategy Z which is not dominated by Z there will exist a solution such that The above constraints (54)-(55) and (56) constitute a necessary but not a sufficient condition. That is, it is possible that the value nodes can be partitioned into non-empty sets  (53) to eliminate Z from further consideration. Adding this constraint to (54)-(55) for Z does not prevent the computation of alternative strategies whose expectations are the same for all value nodes, as such strategies do not dominate each other.
Next, the algorithm can be iterated by maximizing v∈V w v E[U v | Z] to generate further candidate strategies and augmenting the sets of non-dominated strategies and constraints accordingly. Because the number of nondominated strategies is finite, the algorithm will generate them all by construction. In general, the algorithm is likely to perform well if the generation of non-dominated strategies in the early stages helps eliminate many non-dominated strategies. This would be the case, for instance, if there is a strong positive correlation between the expectations of different value nodes.

Computational experiments
We next report results from computational experiments to demonstrate the practical viability of Decision Programming and illustrate how its performance scales with increasing problem size. All implementation were coded in Julia 1.1.0, using the package JuMP to implement models which were solved with Gurobi 8.1.0.

N-monitoring instances
The N -monitoring problem has the same structure as the double monitoring problem in Section 4.1 except that there are N binary reinforcement decisions of which each is informed by its own load report with possible states low and high. For every problem size, we solve 100 instances with randomly generated data, both with and without the probability cuts described in Section 3.6.
Data sets with plausible characteristics were generated as follows. The utility of the structure not failing was set to 100 and that of failing to 0. For the load node L, the probability of the high load state was generated from the uniform distribution U (0, 1) over the unit interval and the remaining probability was assigned to the low load state. For each load level and report, the probability of receiving a correct report was taken to be max{x, 1 − x} where x was generated from the uniform distribution U (0, 1). Further realizations of x, y from U (0, 1) were used to set the prior probability of failure in the case of high load to max{x, 1 − x} and that in the case of low load to min{y, 1 − y}. The costs of fortification c i , i = 1, . . . , N actions were also generated from U (0, 1). The posterior probability of failure after implementing the actions A ⊆ {1, . . . , N } was taken to be that of the prior divided by e i∈A c i , meaning that these actions could only decrease the probabilities of failure and that the more costly actions would be more effective in doing so. In particular, this is an example of a portfolio problem with endogenous uncertainties in which the probability of failure is impacted by the portfolio of fortification decisions.  Table 1 shows the computational times in seconds needed to solve the randomly generated instances, comparing the computational performance with and without the probability cuts discussed in Section 3.6. The results are provided in terms of the average (A) and standard deviation (SD) among 100 replications. A time limit of 25 200 seconds (7 hours) was imposed to all experiments. The entry "-" denotes cases for which no solution could be found within the 7h time limit. In the case with N = 9, for example, there are 4 9 = 262 122 different decision strategies. As can be observed, the inclusion of the probability cuts greatly improve the performance of the solver.

The pig farm problem
In the pig farm problem (for details, see Lauritzen & Nilsson, 2001), a veterinary doctor visits a pig farm each month to test each pig for a disease and decides, based on the uncertain test result, whether or not to inject the pig with a drug which has both curing and preventive effects but which comes at a cost. After four months, the pigs are sold, whereby healthy pigs command a higher market price than diseased ones. There is no access to individual records for each pig, and thus the doctor has to make the treatment decision based on the age of the pig and the most recent test result. This problem is represented by a limited memory influence diagram (LIMID) in Figure 3. Figure 3: The pig farm problem with three decision periods (Lauritzen & Nilsson, 2001).
Despite its practical relevance and conceptual simplicity, this problem is not soluble (meaning that the global optimum it not obtainable by means of solving a sequence of local optimization problems formulated for the different nodes; for details, see Lauritzen & Nilsson, 2001, Definition 14) and consequently the Single Policy Update method proposed by the authors is not guaranteed to find globally optimal solutions. With Decision Programming, this problem can nevertheless be modeled and solved to global optimality rather efficiently. Table   2 presents the optimal solutions and their computation times both for the original four-month version of the problem with three decision periods (in which there are 64 different strategies, corresponding to 4 × 4 × 4 combinations of the four local decision strategies in each of these three months), as well as extensions for the same problem up to seven monthly decision periods with the same numerical parameters.  the problem through explicit enumeration becomes increasingly impractical. In the case of seven periods, for example, there are 4 7 = 16 384 decision strategies.
In Figure 4, the four non-dominated strategies are connected and marked with orange circles, while the remaining 60 dominated strategies are marked with blue circles. Going from left to right, the first non-dominated strategy has the highest expected utility, while the fourth one has the highest conditional lower tail expectation.
The vaccination policies in these non-dominated strategies are, respectively, as follows:

4.
Never treat at any of the 3 months.
Thus, the local strategy of never treating in the first month is a robust decision recommendation, because it is contained in all non-dominated strategies and consequently in the set of 'core' selections in the Robust Portfolio Modelling (RPM) framework (Liesiö et al., 2007(Liesiö et al., , 2008. Moreover, all local strategies involving treatments based on negative test results can be ruled out from consideration, because they are not in any non-dominated strategies and thus belong to the set of 'exterior' RPM selections.

Summary and Conclusions
In this paper, we have developed Decision Programming as an MILP optimization approach for solving mixed-integer multi-stage decision problems with discrete decisions and chance events. Such problems can be represented as influence diagrams, including LIMIDs in which the usual assumption of 'no-forgetting' may not hold. In this approach, risk preferences can be captured through non-linear utility functions over consequences or, alternatively, by extending the objective function with terms for risk measures or by introducing risk constraints. Multiple objectives can be handled, for instance, by using a weighted additive linear function to aggregate consequences (or their utilities) across different value nodes. The set of all non-dominated strategies can also be computed with MILP by employing a weighted linear objective function together with the sequential introduction of constraints to eliminate dominated strategies as well as already discovered non-dominated strategies from further consideration.
In the context of stochastic optimization, Decision Programming is particularly useful in mixed-integer decision problems where the probabilities in the scenario tree depend endogenously on earlier integer-valued decisions. This ability to handle endogenous uncertainties can be helpful, for instance, when appraising R&D and marketing investments, because the size of the market as well as the products' market performance are often contingent on these earlier decisions. From this perspective, the proposed approach can be viewed as a generalization of Contingent Portfolio Programming that allows the chance events describing the scenario tree depend on project selection decisions.
Importantly, the Decision Programming framework can be employed to address problems that cannot solved with dynamic programming techniques, such as problems in which earlier decisions cannot be recalled or in which the presence of deterministic and chance constraints make it impractical or impossible to apply dynamic programming techniques. Therefore, although Decision Programming has parallels to developments in stochastic mixed-integer dynamic programming (such as employing mathematical programming formulation to find optimal policies, as in the seminal work of Manne (1960) and ensuing literature; see Bertsekas (2012) for a thorough exposition), Decision Programming makes it possible to solve a broader class of problems which are not amenable to dynamic programming. Technically, the key feature of our approach is that, instead of exploiting recursion as the underpinning framework, we exploit the expressiveness of influence diagrams for problem structuring and then develop equivalent deterministic MILP formulations that can be solved using off-the-shelf MILP solvers.
Based on our numerical experiments, the Decision Programming approach allows for solving large-scale problems to optimality. Quite importantly, its computational performance can be radically enhanced through the use of probability cuts which exploit the specific properties of probabilistic constraints as well as whatever symmetric properties the problem structure may feature.
Nevertheless, Decision Programming is subject to the well-known curse of dimensionality, just as other linear programming-based approaches for solving dynamic problems with a larger number of decision periods and uncertainties. However, given that powerful MILP decomposition and parallelization techniques are becoming widely accessible, the proposed approach holds considerable promise in extending the expressiveness of influence diagrams in problem structuring while offering possibilities for handling multiple objectives subject to a much broader range of constraints than what conventional approaches for building and solving influence diagrams can accommodate.
Assume that (13) holds for j ∈ 1, . . . , k − 1 with k − 1 < n. We show that it holds for k, too. If k ∈ C is a chance node, {j | j ∈ D, j ≤ k} = {j | j ∈ D, j ≤ k − 1} and P(s 1:k | Z) = i∈C i≤k P X i = s i | X I(i) = s I(i) j∈D j≤k I Z j (s I(j) ) = s j = P X k = s k | X I(k) = s I(k) i∈C i≤k−1 P X i = s i | X I(i) = s I(i) j∈D j≤k−1 I Z j (s I(j) ) = s j = P X k = s k | X I(k) = s I(k) π k−1 (s) = π k (s), where the last equality follows the induction hypothesis and (7). Analogously, if k ∈ D is a decision node, then P(s 1:k | Z) = I Z k (s I(k) ) = s k i∈C i≤k−1 P X i = s i | X I(i) = s I(i) j∈D j≤k−1 I Z j (s I(j) ) = s j = z(s k | s I(k) )π k−1 (s) = π k (s), where the last equality follows the induction hypothesis and equations (5) and (8).