State partitioning based linear program for stochastic dynamic programs: An invariance property
Introduction
The Linear Programming (LP) approach to solving infinite horizon stochastic Dynamic Programs (DPs) originated from the papers: [18], [6], [5], [12]. The basic feature of an LP approach for solving DPs corresponding to maximization of a discounted payoff is that the optimal solution of the DP (also referred to as the optimal value function) is the optimal solution of the LP for every positive cost function. The constraint set describing the feasible solution of the LP and the number of independent variables are typically very large (curse of dimensionality) and hence, obtaining the exact solution of a DP (stochastic or otherwise) via an LP approach is not practical. Despite this limitation, an LP approach provides a tractable method for approximate dynamic programming [19], [22], [23]. The main questions regarding the tractability and quality of approximate DP revolve around restricting the value function in a suitable way. The questions are: (1) How does one restrict the value function, i.e., what basis functions should one choose for parameterizing the value function? (2) Are there a posteriori bounds, that one can provide for the optimal value function, from the solution of a restricted LP? If the restrictions imposed on the value function are consistent with the physics/structure of the problem, one can expect reasonably tight bounds. There is another question that naturally arises: in the unrestricted case, the optimal solution of the LP is independent of the choice of the positive cost function. While it is unreasonable to expect that the optimal value function be a feasible solution of the restricted LP, one can ask if the optimal solution of the restricted LP is the same for every choice of positive cost function for the LP. In this context, a common solution strategy is to approximate the value (cost-to-go) function by a linear functional of a priori chosen basis functions [22]. This approach is attractive in that for a certain class of basis functions, feasibility of the approximate (or restricted) LP is guaranteed [4]. One such choice of the basis functions is to pick indicator functions, that result in the so-called state partitioning/aggregation method. Here the state space is partitioned into disjoint sets or partitions and the approximate value function is restricted to be the same for all the states in a partition. The number of variables for the LP therefore reduces to the number of partitions. State aggregation based approximation techniques were originally proposed in [1], [2], [20]. Since then, substantial work has been reported in the literature on this topic (see [24] and references therein). In this article, we show that:
- 1.
If one were to adopt a state partitioning approach, then the solution to the restricted LP is independent of the positive cost or objective function. Moreover, the optimal solution is dominated by every feasible solution to the restricted LP.
- 2.
Considering a lifted LP formulation, that encompasses a bigger feasible set, via iterated Bellman inequalities [25], does not improve upon the upper bound provided by the restricted LP.
The remainder of the paper is organized as follows: we provide a general overview of stochastic dynamic programs in Section 2 followed by required linear programming preliminaries in Section 2.1. In Section 3, we introduce the partitioning based restricted LP approach used to approximate the optimal value function. In Section 3.1, we showcase the central result that the optimal solution to the restricted LP is independent of the underlying positive cost function and furthermore, that it is the least upper bound to the optimal value function. Finally, in Section 3.2, we show that expanding the feasible set via a lifted LP formulation does not help in obtaining a tighter upper bound, followed by the conclusions in Section 4.
Section snippets
Stochastic dynamic programming
Consider a discrete-time Markov decision process (MDP) with a finite state space . For each state , there is a finite set of available actions . From current state , taking action results in a reward . Without loss of generality, we assume that and is also bounded from above. The system follows discrete-time dynamics given by: where indicates time. We assume that the random input can only take a finite set of values ;
State partitioning based restricted LP
In this section, we consider a state partitioning based restricted LP to compute upper bound approximations to the optimal value function . Similar to the exact LP, we establish an interesting uniqueness property for the optimal solution to the restricted LP. Let the set of all states be partitioned into disjoint sets, . We are interested in a piecewise constant approximation of the optimal value function given by: . We introduce the notation: if
Conclusions
A state partitioning based restricted LP method is considered for large scale stochastic DPs. Contrary to existing results on approximate LPs, we have shown that the optimal solution to the state partitioning based restricted LP is independent of the underlying positive cost function. The implication of this result, for a practitioner, is that he/she need not spend (as is the case, typically) a lot of effort in the selection of the state-relevance weights, as they do not matter. Furthermore, we
References (25)
State aggregation in dynamic programming: an application to scheduling of independent jobs on parallel processors
Oper. Res. Lett.
(1983)A modified dynamic programming method for Markovian decision problems
J. Math. Anal. Appl.
(1966)- et al.
Generalized polynomial approximations in Markovian decision processes
J. Math. Anal. Appl.
(1985) - et al.
Aggregation in dynamic programming
Oper. Res.
(1987) Dynamic Programming
(1957)- et al.
The linear programming approach to approximate dynamic programming
Oper. Res.
(2003) On linear programming in a Markov decision problem
Manage. Sci.
(1970)A probabilistic production and inventory problem
Manage. Sci.
(1963)Surrogate constraints
Oper. Res.
(1968)Surrogate constraint duality in mathematical programming
Oper. Res.
(1975)
Surrogate mathematical programming
Oper. Res.
Solution of large-scale symmetric travelling salesman problems
Math. Program.
Cited by (3)
Performance Guarantee of an Approximate Dynamic Programming Policy for Robotic Surveillance
2016, IEEE Transactions on Automation Science and EngineeringApproximate Dynamic Programming Applied to UAV Perimeter Patrol
2013, Lecture Notes in Control and Information SciencesLower bounding linear program for the perimeter patrol optimization problem
2014, Journal of Guidance, Control, and Dynamics