Skip to main content

Reinforcement Learning and Formal Requirements

  • Conference paper
  • First Online:
Book cover Numerical Software Verification (NSV 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11652))

Included in the following conference series:

Abstract

Reinforcement learning is an approach to controller synthesis where agents rely on reward signals to choose actions in order to satisfy the requirements implicit in reward signals. Oftentimes non-experts have to come up with the requirements and their translation to rewards under significant time pressure, even though manual translation is time consuming and error prone. For safety-critical applications of reinforcement learning a rigorous design methodology is needed and, in particular, a principled approach to requirement specification and to the translation of objectives into the form required by reinforcement learning algorithms.

Formal logic provides a foundation for the rigorous and unambiguous requirement specification of learning objectives. However, reinforcement learning algorithms require requirements to be expressed as scalar reward signals. We discuss a recent technique, called limit-reachability, that bridges this gap by faithfully translating logic-based requirements into the scalar reward form needed in model-free reinforcement learning. This technique enables the synthesis of controllers that maximize the probability to satisfy given logical requirements using off-the-shelf, model-free reinforcement learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aksaray, D., Jones, A., Kong, Z., Schwager, M., Belta, C.: Q-learning for robust satisfaction of signal temporal logic specifications. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 6565–6570 (2016)

    Google Scholar 

  2. de Alfaro, L.: Formal verification of probabilistic systems. Ph.D. thesis, Stanford University (1998)

    Google Scholar 

  3. Baier, C., Größer, M.: Recognizing \(\omega \)-regular languages with probabilistic automata. In: Logic in Computer Science (LICS 2005), pp. 137–146, June 2005

    Google Scholar 

  4. Baier, C., Katoen, J.P.: Principles of Model Checking. MIT Press, Cambridge (2008)

    MATH  Google Scholar 

  5. Banach, S.: Sur les opérations dans les ensembles abstraits et leur application aux équations intégrales. Fundam. Math. 3(1), 133–181 (1922). http://eudml.org/doc/213289

    Article  Google Scholar 

  6. Borkar, V.S., Meyn, S.P.: The ode method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim. 38(2), 447–469 (2000)

    Article  MathSciNet  Google Scholar 

  7. Brázdil, T., et al.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 98–114. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11936-6_8

    Chapter  Google Scholar 

  8. Chatterjee, K., Henzinger, M.: Faster and dynamic algorithms for maximal end-component decomposition and related graph problems in probabilistic verification. In: Symposium on Discrete Algorithms (SODA), pp. 1318–1336, January 2011

    Google Scholar 

  9. Clarke, E.M., Emerson, E.A.: Design and synthesis of synchronization skeletons using branching time temporal logic. In: Kozen, D. (ed.) Logic of Programs 1981. LNCS, vol. 131, pp. 52–71. Springer, Heidelberg (1982). https://doi.org/10.1007/BFb0025774

    Chapter  Google Scholar 

  10. Clarke, E.M., Grumberg, O., Peled, D.A.: Model Checking. MIT Press, Cambridge (1999)

    Google Scholar 

  11. Courcoubetis, C., Yannakakis, M.: The complexity of probabilistic verification. J. ACM 42(4), 857–907 (1995)

    Article  MathSciNet  Google Scholar 

  12. Donzé, A., Maler, O.: Robust satisfaction of temporal logic over real-valued signals. In: Chatterjee, K., Henzinger, T.A. (eds.) FORMATS 2010. LNCS, vol. 6246, pp. 92–106. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15297-9_9

    Chapter  MATH  Google Scholar 

  13. Fainekos, G., Pappas, G.J.: Robustness of temporal logic specifications for continuous-time signals. Theor. Comput. Sci. 410, 4262–4291 (2009)

    Article  MathSciNet  Google Scholar 

  14. Fainekos, G.E.: Robustness of temporal logic specifications. Ph.D. thesis, Department of Computer and Information Science, University of Pennsylvania (2008)

    Google Scholar 

  15. Fu, J., Topcu, U.: Probably approximately correct MDP learning and control with temporal logic constraints. In: Robotics: Science and Systems, July 2014

    Google Scholar 

  16. Hahn, E.M., Li, G., Schewe, S., Turrini, A., Zhang, L.: Lazy probabilistic model checking without determinisation. CoRR abs/1311.2928 (2013). http://arxiv.org/abs/1311.2928

  17. Hahn, E.M., Li, G., Schewe, S., Turrini, A., Zhang, L.: Lazy probabilistic model checking without determinisation. In: Concurrency Theory, (CONCUR), pp. 354–367 (2015)

    Google Scholar 

  18. Hahn, E.M., Perez, M., Schewe, S., Somenzi, F., Trivedi, A., Wojtczak, D.: Omega-regular objectives in model-free reinforcement learning. In: Vojnar, T., Zhang, L. (eds.) TACAS 2019. LNCS, vol. 11427, pp. 395–412. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17462-0_27

    Chapter  Google Scholar 

  19. Hasanbeig, M., Abate, A., Kroening, D.: Logically-correct reinforcement learning. CoRR abs/1801.08099 (2018). http://arxiv.org/abs/1801.08099

  20. Hasanbeig, M., Abate, A., Kroening, D.: Certified reinforcement learning with logic guidance. arXiv e-prints arXiv:1902.00778, February 2019

  21. Hiromoto, M., Ushio, T.: Learning an optimal control policy for a Markov decision process under linear temporal logic specifications. In: Symposium Series on Computational Intelligence, pp. 548–555, December 2015

    Google Scholar 

  22. Hordijk, A., Yushkevich, A.A.: Blackwell optimality. In: Feinberg, E.A., Shwartz, A. (eds.) Handbook of Markov Decision Processes: Methods and Applications, pp. 231–267. Springer, Boston (2002). https://doi.org/10.1007/978-1-4615-0805-2_8

    Chapter  Google Scholar 

  23. Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_47

    Chapter  Google Scholar 

  24. Lahijanian, M., Andersson, S.B., Belta, C.: Temporal logic motion planning and control with probabilistic satisfaction guarantees. IEEE Trans. Robot. 28(2), 396–409 (2012)

    Article  Google Scholar 

  25. Lahijanian, M., Maly, M.R., Fried, D., Kavraki, L.E., Kress-Gazit, H., Vardi, M.Y.: Iterative temporal planning in uncertain environments with partial satisfaction guarantees. IEEE Trans. Robot. 32(3), 538–599 (2016)

    Article  Google Scholar 

  26. Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016). http://dl.acm.org/citation.cfm?id=2946645.2946684

    MathSciNet  MATH  Google Scholar 

  27. Li, X., Vasile, C.I., Belta, C.: Reinforcement learning with temporal logic rewards. In: International Conference on Intelligent Robots and Systems (IROS), pp. 3834–3839 (2017)

    Google Scholar 

  28. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. CoRR abs/1509.02971 (2015). http://arxiv.org/abs/1509.02971

  29. Liu, R., et al.: An intriguing failing of convolutional neural networks and the CoordConv solution. ArXiv e-prints 1807.03247, July 2018

  30. Manna, Z., Pnueli, A.: The Temporal Logic of Reactive and Concurrent Systems *Specification*. Springer, New York (1992). https://doi.org/10.1007/978-1-4612-0931-7

    Book  MATH  Google Scholar 

  31. Mnih, V., et al.: Human-level control through reinforcement learning. Nature 518, 529–533 (2015)

    Article  Google Scholar 

  32. Pnueli, A.: The temporal logic of programs. In: IEEE Symposium on Foundations of Computer Science, pp. 46–57 (1977)

    Google Scholar 

  33. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)

    Book  Google Scholar 

  34. Queille, J.P., Sifakis, J.: Specification and verification of concurrent systems in CESAR. In: Dezani-Ciancaglini, M., Montanari, U. (eds.) Programming 1982. LNCS, vol. 137, pp. 337–351. Springer, Heidelberg (1982). https://doi.org/10.1007/3-540-11494-7_22

    Chapter  Google Scholar 

  35. Riedmiller, M.: Neural fitted Q iteration – first experiences with a data efficient neural reinforcement learning method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317–328. Springer, Heidelberg (2005). https://doi.org/10.1007/11564096_32

    Chapter  Google Scholar 

  36. Sadigh, D., Kim, E., Coogan, S., Sastry, S.S., Seshia, S.A.: A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications. In: IEEE Conference on Decision and Control (CDC), pp. 1091–1096, December 2014

    Google Scholar 

  37. Sickert, S., Esparza, J., Jaax, S., Křetínský, J.: Limit-deterministic Büchi automata for linear temporal logic. In: Chaudhuri, S., Farzan, A. (eds.) CAV 2016. LNCS, vol. 9780, pp. 312–332. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41540-6_17

    Chapter  Google Scholar 

  38. Sickert, S., Křetínský, J.: MoChiBA: probabilistic LTL model checking using limit-deterministic Büchi automata. In: Artho, C., Legay, A., Peled, D. (eds.) ATVA 2016. LNCS, vol. 9938, pp. 130–137. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46520-3_9

    Chapter  Google Scholar 

  39. Strehl, A.L., Li, L., Wiewiora, E., Langford, J., Littman, M.L.: PAC model-free reinforcement learning. In: International Conference on Machine Learning, ICML, pp. 881–888 (2006)

    Google Scholar 

  40. Sutton, R.S., Barto, A.G.: Reinforcement Learnging: An Introduction, 2nd edn. MIT Press, Cambridge (2018)

    MATH  Google Scholar 

  41. Tsitsiklis, J.N., Roy, B.V.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1997)

    Article  MathSciNet  Google Scholar 

  42. (2018). http://fortune.com/2018/05/08/uber-autopilot-death-reason. Accessed 11 May 2018

  43. Vardi, M.Y.: Automatic verification of probabilistic concurrent finite state programs. In: FOCS, pp. 327–338 (1985)

    Google Scholar 

  44. Wang, J., Ding, X.C., Lahijanian, M., Paschalidis, I.C., Belta, C.: Temporal logic motion control using actor-critic methods. Int. J. Robot. Res. 34(10), 1329–1344 (2015)

    Article  Google Scholar 

  45. Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    MATH  Google Scholar 

  46. Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge, UK, May 1989

    Google Scholar 

  47. (2018). https://en.wikipedia.org/wiki/Waymo#Limitations. Accessed 11 May 2018

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ashutosh Trivedi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Somenzi, F., Trivedi, A. (2019). Reinforcement Learning and Formal Requirements. In: Zamani, M., Zufferey, D. (eds) Numerical Software Verification. NSV 2019. Lecture Notes in Computer Science(), vol 11652. Springer, Cham. https://doi.org/10.1007/978-3-030-28423-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-28423-7_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-28422-0

  • Online ISBN: 978-3-030-28423-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics