Elsevier

Operations Research Letters

Volume 40, Issue 6, November 2012, Pages 487-491
Operations Research Letters

State partitioning based linear program for stochastic dynamic programs: An invariance property

https://doi.org/10.1016/j.orl.2012.08.006Get rights and content

Abstract

A common approximate dynamic programming method entails state partitioning and the use of linear programming, i.e., the state-space is partitioned and the optimal value function is approximated by a constant over each partition. By minimizing a positive cost function defined on the partitions, one can construct an upper bound for the optimal value function. We show that this approximate value function is independent of the positive cost function and that it is the least upper bound, given the partitions.

Introduction

The Linear Programming (LP) approach to solving infinite horizon stochastic Dynamic Programs (DPs) originated from the papers: [18], [6], [5], [12]. The basic feature of an LP approach for solving DPs corresponding to maximization of a discounted payoff is that the optimal solution of the DP (also referred to as the optimal value function) is the optimal solution of the LP for every positive cost function. The constraint set describing the feasible solution of the LP and the number of independent variables are typically very large (curse of dimensionality) and hence, obtaining the exact solution of a DP (stochastic or otherwise) via an LP approach is not practical. Despite this limitation, an LP approach provides a tractable method for approximate dynamic programming [19], [22], [23]. The main questions regarding the tractability and quality of approximate DP revolve around restricting the value function in a suitable way. The questions are: (1) How does one restrict the value function, i.e., what basis functions should one choose for parameterizing the value function? (2) Are there a posteriori bounds, that one can provide for the optimal value function, from the solution of a restricted LP? If the restrictions imposed on the value function are consistent with the physics/structure of the problem, one can expect reasonably tight bounds. There is another question that naturally arises: in the unrestricted case, the optimal solution of the LP is independent of the choice of the positive cost function. While it is unreasonable to expect that the optimal value function be a feasible solution of the restricted LP, one can ask if the optimal solution of the restricted LP is the same for every choice of positive cost function for the LP. In this context, a common solution strategy is to approximate the value (cost-to-go) function by a linear functional of a priori chosen basis functions [22]. This approach is attractive in that for a certain class of basis functions, feasibility of the approximate (or restricted) LP is guaranteed [4]. One such choice of the basis functions is to pick indicator functions, that result in the so-called state partitioning/aggregation method. Here the state space is partitioned into disjoint sets or partitions and the approximate value function is restricted to be the same for all the states in a partition. The number of variables for the LP therefore reduces to the number of partitions. State aggregation based approximation techniques were originally proposed in [1], [2], [20]. Since then, substantial work has been reported in the literature on this topic (see [24] and references therein). In this article, we show that:

  • 1.

    If one were to adopt a state partitioning approach, then the solution to the restricted LP is independent of the positive cost or objective function. Moreover, the optimal solution is dominated by every feasible solution to the restricted LP.

  • 2.

    Considering a lifted LP formulation, that encompasses a bigger feasible set, via iterated Bellman inequalities [25], does not improve upon the upper bound provided by the restricted LP.

The first result is in stark contrast to schemes involving general basis functions, where the choice of the LP’s cost function or state-relevance weights, as referred to in the literature, has a significant impact on the quality of the approximation (see [4, Section 3] for details). Our result indicates that for the state partitioning method, one need not worry about the problem of selection of the state-relevance weights, since their choice has no bearing on the quality of the upper bound approximation to the optimal value function. The second result is essentially a negative one in that we show that lifting of the partitioning based restricted LP via iterated Bellman inequalities (as suggested in [25]) does not help either, in improving the bound.

The remainder of the paper is organized as follows: we provide a general overview of stochastic dynamic programs in Section 2 followed by required linear programming preliminaries in Section 2.1. In Section 3, we introduce the partitioning based restricted LP approach used to approximate the optimal value function. In Section 3.1, we showcase the central result that the optimal solution to the restricted LP is independent of the underlying positive cost function and furthermore, that it is the least upper bound to the optimal value function. Finally, in Section 3.2, we show that expanding the feasible set via a lifted LP formulation does not help in obtaining a tighter upper bound, followed by the conclusions in Section 4.

Section snippets

Stochastic dynamic programming

Consider a discrete-time Markov decision process (MDP) with a finite state space S={1,2,,|S|}. For each state xS, there is a finite set of available actions Ux. From current state x, taking action uUx results in a reward Ru(x). Without loss of generality, we assume that Ru(x)0,x,u and is also bounded from above. The system follows discrete-time dynamics given by: x(t+1)=f(x(t),u(t),Y(t)), where t indicates time. We assume that the random input Y can only take a finite set of values Yl; l=0,

State partitioning based restricted LP

In this section, we consider a state partitioning based restricted LP to compute upper bound approximations to the optimal value function V. Similar to the exact LP, we establish an interesting uniqueness property for the optimal solution to the restricted LP. Let the set of all states S be partitioned into M disjoint sets, Si,i=1,,M. We are interested in a piecewise constant approximation of the optimal value function given by: V(x)=v(i),xSi,i=1,,M. We introduce the notation: if f(x,u,Ys)

Conclusions

A state partitioning based restricted LP method is considered for large scale stochastic DPs. Contrary to existing results on approximate LPs, we have shown that the optimal solution to the state partitioning based restricted LP is independent of the underlying positive cost function. The implication of this result, for a practitioner, is that he/she need not spend (as is the case, typically) a lot of effort in the selection of the state-relevance weights, as they do not matter. Furthermore, we

References (25)

  • H.J. Greenberg et al.

    Surrogate mathematical programming

    Oper. Res.

    (1970)
  • M. Grötschel et al.

    Solution of large-scale symmetric travelling salesman problems

    Math. Program.

    (1991)
  • Cited by (3)

    View full text