Reinforcement Learning and Formal Requirements

Somenzi, Fabio; Trivedi, Ashutosh

doi:10.1007/978-3-030-28423-7_2

Fabio Somenzi¹⁰ &
Ashutosh Trivedi¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11652))

Included in the following conference series:

International Workshop on Numerical Software Verification

695 Accesses
2 Citations

Abstract

Reinforcement learning is an approach to controller synthesis where agents rely on reward signals to choose actions in order to satisfy the requirements implicit in reward signals. Oftentimes non-experts have to come up with the requirements and their translation to rewards under significant time pressure, even though manual translation is time consuming and error prone. For safety-critical applications of reinforcement learning a rigorous design methodology is needed and, in particular, a principled approach to requirement specification and to the translation of objectives into the form required by reinforcement learning algorithms.

Formal logic provides a foundation for the rigorous and unambiguous requirement specification of learning objectives. However, reinforcement learning algorithms require requirements to be expressed as scalar reward signals. We discuss a recent technique, called limit-reachability, that bridges this gap by faithfully translating logic-based requirements into the scalar reward form needed in model-free reinforcement learning. This technique enables the synthesis of controllers that maximize the probability to satisfy given logical requirements using off-the-shelf, model-free reinforcement learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aksaray, D., Jones, A., Kong, Z., Schwager, M., Belta, C.: Q-learning for robust satisfaction of signal temporal logic specifications. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 6565–6570 (2016)
Google Scholar
de Alfaro, L.: Formal verification of probabilistic systems. Ph.D. thesis, Stanford University (1998)
Google Scholar
Baier, C., Größer, M.: Recognizing \(\omega \)-regular languages with probabilistic automata. In: Logic in Computer Science (LICS 2005), pp. 137–146, June 2005
Google Scholar
Baier, C., Katoen, J.P.: Principles of Model Checking. MIT Press, Cambridge (2008)
MATH Google Scholar
Banach, S.: Sur les opérations dans les ensembles abstraits et leur application aux équations intégrales. Fundam. Math. 3(1), 133–181 (1922). http://eudml.org/doc/213289
Article Google Scholar
Borkar, V.S., Meyn, S.P.: The ode method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim. 38(2), 447–469 (2000)
Article MathSciNet Google Scholar
Brázdil, T., et al.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 98–114. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11936-6_8
Chapter Google Scholar
Chatterjee, K., Henzinger, M.: Faster and dynamic algorithms for maximal end-component decomposition and related graph problems in probabilistic verification. In: Symposium on Discrete Algorithms (SODA), pp. 1318–1336, January 2011
Google Scholar
Clarke, E.M., Emerson, E.A.: Design and synthesis of synchronization skeletons using branching time temporal logic. In: Kozen, D. (ed.) Logic of Programs 1981. LNCS, vol. 131, pp. 52–71. Springer, Heidelberg (1982). https://doi.org/10.1007/BFb0025774
Chapter Google Scholar
Clarke, E.M., Grumberg, O., Peled, D.A.: Model Checking. MIT Press, Cambridge (1999)
Google Scholar
Courcoubetis, C., Yannakakis, M.: The complexity of probabilistic verification. J. ACM 42(4), 857–907 (1995)
Article MathSciNet Google Scholar
Donzé, A., Maler, O.: Robust satisfaction of temporal logic over real-valued signals. In: Chatterjee, K., Henzinger, T.A. (eds.) FORMATS 2010. LNCS, vol. 6246, pp. 92–106. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15297-9_9
Chapter MATH Google Scholar
Fainekos, G., Pappas, G.J.: Robustness of temporal logic specifications for continuous-time signals. Theor. Comput. Sci. 410, 4262–4291 (2009)
Article MathSciNet Google Scholar
Fainekos, G.E.: Robustness of temporal logic specifications. Ph.D. thesis, Department of Computer and Information Science, University of Pennsylvania (2008)
Google Scholar
Fu, J., Topcu, U.: Probably approximately correct MDP learning and control with temporal logic constraints. In: Robotics: Science and Systems, July 2014
Google Scholar
Hahn, E.M., Li, G., Schewe, S., Turrini, A., Zhang, L.: Lazy probabilistic model checking without determinisation. CoRR abs/1311.2928 (2013). http://arxiv.org/abs/1311.2928
Hahn, E.M., Li, G., Schewe, S., Turrini, A., Zhang, L.: Lazy probabilistic model checking without determinisation. In: Concurrency Theory, (CONCUR), pp. 354–367 (2015)
Google Scholar
Hahn, E.M., Perez, M., Schewe, S., Somenzi, F., Trivedi, A., Wojtczak, D.: Omega-regular objectives in model-free reinforcement learning. In: Vojnar, T., Zhang, L. (eds.) TACAS 2019. LNCS, vol. 11427, pp. 395–412. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17462-0_27
Chapter Google Scholar
Hasanbeig, M., Abate, A., Kroening, D.: Logically-correct reinforcement learning. CoRR abs/1801.08099 (2018). http://arxiv.org/abs/1801.08099
Hasanbeig, M., Abate, A., Kroening, D.: Certified reinforcement learning with logic guidance. arXiv e-prints arXiv:1902.00778, February 2019
Hiromoto, M., Ushio, T.: Learning an optimal control policy for a Markov decision process under linear temporal logic specifications. In: Symposium Series on Computational Intelligence, pp. 548–555, December 2015
Google Scholar
Hordijk, A., Yushkevich, A.A.: Blackwell optimality. In: Feinberg, E.A., Shwartz, A. (eds.) Handbook of Markov Decision Processes: Methods and Applications, pp. 231–267. Springer, Boston (2002). https://doi.org/10.1007/978-1-4615-0805-2_8
Chapter Google Scholar
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_47
Chapter Google Scholar
Lahijanian, M., Andersson, S.B., Belta, C.: Temporal logic motion planning and control with probabilistic satisfaction guarantees. IEEE Trans. Robot. 28(2), 396–409 (2012)
Article Google Scholar
Lahijanian, M., Maly, M.R., Fried, D., Kavraki, L.E., Kress-Gazit, H., Vardi, M.Y.: Iterative temporal planning in uncertain environments with partial satisfaction guarantees. IEEE Trans. Robot. 32(3), 538–599 (2016)
Article Google Scholar
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016). http://dl.acm.org/citation.cfm?id=2946645.2946684
MathSciNet MATH Google Scholar
Li, X., Vasile, C.I., Belta, C.: Reinforcement learning with temporal logic rewards. In: International Conference on Intelligent Robots and Systems (IROS), pp. 3834–3839 (2017)
Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. CoRR abs/1509.02971 (2015). http://arxiv.org/abs/1509.02971
Liu, R., et al.: An intriguing failing of convolutional neural networks and the CoordConv solution. ArXiv e-prints 1807.03247, July 2018
Manna, Z., Pnueli, A.: The Temporal Logic of Reactive and Concurrent Systems *Specification*. Springer, New York (1992). https://doi.org/10.1007/978-1-4612-0931-7
Book MATH Google Scholar
Mnih, V., et al.: Human-level control through reinforcement learning. Nature 518, 529–533 (2015)
Article Google Scholar
Pnueli, A.: The temporal logic of programs. In: IEEE Symposium on Foundations of Computer Science, pp. 46–57 (1977)
Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)
Book Google Scholar
Queille, J.P., Sifakis, J.: Specification and verification of concurrent systems in CESAR. In: Dezani-Ciancaglini, M., Montanari, U. (eds.) Programming 1982. LNCS, vol. 137, pp. 337–351. Springer, Heidelberg (1982). https://doi.org/10.1007/3-540-11494-7_22
Chapter Google Scholar
Riedmiller, M.: Neural fitted Q iteration – first experiences with a data efficient neural reinforcement learning method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317–328. Springer, Heidelberg (2005). https://doi.org/10.1007/11564096_32
Chapter Google Scholar
Sadigh, D., Kim, E., Coogan, S., Sastry, S.S., Seshia, S.A.: A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications. In: IEEE Conference on Decision and Control (CDC), pp. 1091–1096, December 2014
Google Scholar
Sickert, S., Esparza, J., Jaax, S., Křetínský, J.: Limit-deterministic Büchi automata for linear temporal logic. In: Chaudhuri, S., Farzan, A. (eds.) CAV 2016. LNCS, vol. 9780, pp. 312–332. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41540-6_17
Chapter Google Scholar
Sickert, S., Křetínský, J.: MoChiBA: probabilistic LTL model checking using limit-deterministic Büchi automata. In: Artho, C., Legay, A., Peled, D. (eds.) ATVA 2016. LNCS, vol. 9938, pp. 130–137. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46520-3_9
Chapter Google Scholar
Strehl, A.L., Li, L., Wiewiora, E., Langford, J., Littman, M.L.: PAC model-free reinforcement learning. In: International Conference on Machine Learning, ICML, pp. 881–888 (2006)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learnging: An Introduction, 2nd edn. MIT Press, Cambridge (2018)
MATH Google Scholar
Tsitsiklis, J.N., Roy, B.V.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1997)
Article MathSciNet Google Scholar
(2018). http://fortune.com/2018/05/08/uber-autopilot-death-reason. Accessed 11 May 2018
Vardi, M.Y.: Automatic verification of probabilistic concurrent finite state programs. In: FOCS, pp. 327–338 (1985)
Google Scholar
Wang, J., Ding, X.C., Lahijanian, M., Paschalidis, I.C., Belta, C.: Temporal logic motion control using actor-critic methods. Int. J. Robot. Res. 34(10), 1329–1344 (2015)
Article Google Scholar
Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
MATH Google Scholar
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge, UK, May 1989
Google Scholar
(2018). https://en.wikipedia.org/wiki/Waymo#Limitations. Accessed 11 May 2018

Download references

Author information

Authors and Affiliations

Department of Electrical, Computer, and Energy Engineering, University of Colorado Boulder, Boulder, CO, 80309, USA
Fabio Somenzi
Department of Computer Science, University of Colorado Boulder, Boulder, CO, 80309, USA
Ashutosh Trivedi

Authors

Fabio Somenzi
View author publications
You can also search for this author in PubMed Google Scholar
Ashutosh Trivedi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashutosh Trivedi .

Editor information

Editors and Affiliations

University of Colorado Boulder, Boulder, CO, USA
Majid Zamani
Max Planck Institute for Software Systems, Kaiserslautern, Germany
Damien Zufferey

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Somenzi, F., Trivedi, A. (2019). Reinforcement Learning and Formal Requirements. In: Zamani, M., Zufferey, D. (eds) Numerical Software Verification. NSV 2019. Lecture Notes in Computer Science(), vol 11652. Springer, Cham. https://doi.org/10.1007/978-3-030-28423-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-28423-7_2
Published: 03 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28422-0
Online ISBN: 978-3-030-28423-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics