ABSTRACT

This chapter discusses the cost criterion which is the expected average cost, and assumes that the MDP is unichain, An MDP is said to be unichain if the corresponding Markov chain contains a single (aperiodic) ergodic class. The chapter shows that one may restrict to stationary policies without loss of optimality. It obtains an LP formulation similar to the discounted cost, and shows that the COP is equivalent to an LP with a countable set of decision variables and a countable set of constraints. A probabilistic interpretation for this LP can be obtained in the same way as we obtained the dual LP for the discounted cost criterion. The chapter obtains the corresponding results for the Lagrangian.