Skip to main content
Log in

Dynamic programming and behavioral rules

  • Research Article
  • Published:
Economic Theory Bulletin Aims and scope Submit manuscript

Abstract

The standard dynamic programming approach to exact optimization of sequential decision problems is extended to allow approximate optimization. An optimization-based solution is a behavioral rule that satisfies a modified Bellman equation. Existence of a solution is proven. Conversely, given any behavioral rule, there exist infinitely many dynamic programming problems for which the behavioral rule is an optimization-based solution. This result raises the question of whether we need to continue assuming decision makers optimize or approximately optimize.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. I am referring to conscious decision making, not autonomic (subconscious) decisions made continuously and in parallel to conscious decision making.

  2. See Eckstein and Wolpin (1989) for a survey of empirical applications of dynamic programming.

  3. Also known as a randomized policy, or a probabilistic decision rule.

  4. Note that since the feasible actions A\(_{\mathrm{t+1}}\) depend on \(\omega _{\mathrm{t+1}}\), current actions a\(_{\mathrm{t}}\) can affect A\(_{\mathrm{t+1}}\) via \(\mathrm{f}(\omega _{\mathrm{t+1}} {\vert } \mathrm{a}_{\mathrm{t}}, \omega _{\mathrm{t}})\). Thus, we include the case of Fudenberg and Strzalecki (2015) but without their restriction that such future menu effects have no current cost.

  5. By the Ionescu-Tulcea theorem (Feinberg 1996), these probability measures exist for all t > 1. Since f is uniformly continuous, it follows that f(\(\omega _{\mathrm{t}}{\vert } p,\omega _{1})\) is continuous in p. Then, given r \(\in \mathrm{B}_{1}(\mathrm{A}\times \Omega ),\mathrm{V}(,\omega )\) is continuous on \({\mathcal {P}}(\mathrm{A}\times \Omega )\).

  6. Eq. (4) is the generalized version of the standard Bellman equation to allow for randomized actions p.

  7. To see this, consider only “pure-strategy” decision rules, denoted a, that put probability one on a unique action for each state. Since V( , \(\omega \)) is continuous and A is compact there exists a pure-strategy solution to Eqs. (4) and (5). Since any optimal mixed strategy \(p^*\in {\mathcal {P}}(\mathrm{A}\times \Omega )\) must have a support contained within the set of pure-strategy solutions, max\(_{p}\) V(p, \(\omega )\) = max\(_{{{\underline{\varvec{a}}}}}\) V(a, \(\omega )\) = V\(^*\)(\(\omega )\).

  8. The logistic model has been widely used and justified (e.g. Mattson and Weibull 2002).

  9. Note that as \(\upgamma \rightarrow \infty , \mathrm{p}(\mathrm{a}{\vert } \omega , \mathrm{u}; \upgamma ) \rightarrow 0\) for all a \(\notin \) arg max \(_{\mathrm{a}}\) v(a, \(\omega )\). We put a semi-colon before \(\upgamma \) in the notation for p() to indicate that it is a fixed parameter, although we will often suppress \(\upgamma \) in the notation.

  10. When A is countable, we can partition the [0, 1] interval into countably many semi-open sets, each representing one of the discrete actions, and then construct \(\uplambda \) from the Lebesque measure on [0, 1]. When A is homeomorphic to a compact subset of R\(^{\mathrm{n}}\), we can construct \(\uplambda \) from the homeomorphism and the Lebesque measure on R\(^{\mathrm{n}}\). Otherwise, we can use the construction of Dembski (1990).

  11. E.g., see Khamsi and Kirk (2001).

  12. Rust (1994) considers the case in which the state space observed by the DM includes a variable that is hidden from outside observers. He assumes that the DM chooses an exact solution, and hence, by virtue of the contraction mapping property of the Bellman equation, the value function is unique. On the other hand, outside observers can observe only the expected decision rule averaged over the hidden variable. With additional assumptions on the distribution of the hidden variable and on the state transition function, Rust proves that the expected decision rule has the logistic form and satisfies a modified Bellman equation analogous to Eq. (6). We do not need these additional assumptions.

  13. See footnote 10.

  14. Moreover, while outside observers may have reasonably accurate information about observable monetary and physical consequences, they cannot know \({\hat{{\mathrm{r}}}}(\mathrm{a},\omega )\) since the latter is an unobservable representation of the DM’s preferences over uncertain monetary and physical consequences.

  15. Rust (1994) shows a similar result for the restricted class of deterministic decision rules.

  16. This result contrasts with Harstad and Selten (2013) who argue that there is a fundamental difference between optimization-based behavior and boundedly-rational behavior. Crawford (2013) provides further insightful comments.

  17. For a critique of this aggregation approach in terms of neoclassical welfare economics, see Blackorby and Donaldson (1990).

References

  • Anscombe, F.J., Aumann, R.J.: A definition of subjective probability. Ann. Math. Stat. 34, 199–205 (1963)

    Article  Google Scholar 

  • Bellman, R.: The theory of dynamic programming. Bull. Am. Math. Soc. 60, 503–516 (1954)

    Article  Google Scholar 

  • Bernheim, B.D.: Behavioral welfare economics. J. Euro. Eco. Assoc. 7, 267–319 (2009)

    Article  Google Scholar 

  • Bernheim, B.D., Rangel, A.: Beyond revealed preference: choice theoretic foundations for behavioral welfare economics. Quart. J. Eco. 124, 51–104 (2009)

    Article  Google Scholar 

  • Bertsekas, D. P.: Dynamic Programming and Optimal Control, (2nd ed.), Athena Scientific (2000)

  • Blackorby, C., Donaldson, D.: A review article: the case against the use of the sum of compensating variations in cost-benefit analysis. Canadian J. Eco. 23, 471–494 (1990)

    Article  Google Scholar 

  • Crawford, V.: Boundedly rational versus optimization-based models of strategic thinking and learning in games. J. Econom. Literat. 51, 512–527 (2013)

    Article  Google Scholar 

  • Dembski, W.: Uniform probability. J. Theo. Prob. 3, 611–625 (1990)

    Article  Google Scholar 

  • Eckstein, Z., Wolpin, K.: The specification and estimation of dynamic stochastic discrete choice models: a survey. J. Human Resour. 24, 562–598 (1989)

    Article  Google Scholar 

  • Feinberg, E.: On measurability and representation of strategic measures in markov decision processes. Statistics, Probability and Game Theory, IMS Lecture Notes—Monograph Series 30 (1996)

  • Fleurbaey, M., Schokkaert, E.: Behavioral welfare economics and redistribution. Am. Econ. J. Microecon. 5, 180–205 (2013)

    Article  Google Scholar 

  • Fudenberg, D., Strzalecki, T.: Dynamic logit with choice aversion. Econometrica. 83, 651–691 (2015)

    Article  Google Scholar 

  • Harstad, R., Selten, R.: Bounded-rationality models: tasks to become intellectually competitive. J. Econ. Literat. 51, 496–511 (2013)

    Article  Google Scholar 

  • Houthakker, H.S.: Revealed preference and the utility function. Economica. 17, 159–174 (1950)

    Article  Google Scholar 

  • Khamsi, M. A., Kirk, W. A.: An Introduction to Metric Spaces and Fixed Point Theory. Wiley (2001)

  • Karni, E., Schmeidler, D., Vind, K.: On state dependent preferences and subjective probabilities. Econometrica. 51, 1021–1031 (1983)

    Article  Google Scholar 

  • Mattson, L.-G., Weibull, J.: Probabilistic choice and procedural rationality. Games Econ. Behav. 41, 61–78 (2002)

    Article  Google Scholar 

  • Rabin, M.: Incorporating limited rationality into economics. J. Econ. Literat. 51, 528–543 (2013)

    Article  Google Scholar 

  • Rubinstein, A., Salant, Y.: Eliciting Welfare Preferences from Behavioral Data Sets. Rev. Econ. Studies 79, 375–387 (2012)

    Article  Google Scholar 

  • Rust, J.: Structural Estimation of Markov Decision Problems. Handbook Econ. 4, 3081–3143 (1994)

    Google Scholar 

  • Samuelson, P.: A note on the pure theory of consumer’s behavior. Economica 5, 61–71 (1938)

    Article  Google Scholar 

  • Samuelson, P.: Consumption theory in terms of revealed preference. Economica 15, 243–253 (1948)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dale O. Stahl.

Additional information

The author is indebted to Max Stinchcombe, Tom Wiseman and anonymous referees for criticisms and comments. All errors and interpretations are the sole responsibility of the author.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Stahl, D.O. Dynamic programming and behavioral rules. Econ Theory Bull 5, 165–174 (2017). https://doi.org/10.1007/s40505-016-0110-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40505-016-0110-3

Keywords

JEL Classification

Navigation