Discovery and learning of models with predictive state representations for dynamical systems without reset

doi:10.1016/j.knosys.2009.01.001

Knowledge-Based Systems

Volume 22, Issue 8, December 2009, Pages 557-561

https://doi.org/10.1016/j.knosys.2009.01.001 Get rights and content

Abstract

Modeling dynamical systems is a common problem in science and engineering. After a system has been modeled, the system can be controlled and predicted. Predictive state representations (PSRs) is a recently proposed method of modeling controlled dynamical systems. One central problem in the PSRs literature is concerned with discovery and learning of PSRs. This paper presents a new algorithm for discovery and learning of PSRs by using only a continuous trace of actions and observations as the training data, in which the history at any time step in the training data can be identified, and then the prediction of test at a history and the PSR model of the system can be obtained. We empirically evaluate and compare our algorithm on a standard set of POMDP test problems and the empirical results show that our algorithm is competitive and outperforms the suffix-history algorithm.

Introduction

Modeling dynamical systems is a common problem in science and engineering, which has widely applied in many fields, such as artificial intelligence, operation research, computer science, etc. [1]. For the artificial intelligence field, one of the central and challenging problems is concerned with agent operating in environments that are partially observable and stochastic, i.e., how an agent can plan and act optimally in such environments. One commonly used technique for solving such a problem is to model these systems firstly, then solving the problem by using the obtained models.

Thus far, perhaps the most general framework for modeling such controlled dynamical systems is Partially Observable Markov Decision Processes (POMDPs) [2], but it is well known that the POMDP model is based on unobserved or hidden states, and requires much prior knowledge when learning such a model in reality. As an alterative, Littman, Sutton, and Singh (2002) proposed a new framework for modeling such systems, called predictive state representations (PSRs) [3], which models a system by defining and operating on the PSR state. A PSR state is a vector of predictions or outcome probabilities for tests that could be done on the system, where a test is a sequence of action-observation pairs. The PSR state summarizes all information regarding the past and the use of it allows one to transform the original POMDP into a Markov decision process (MDP). Unlike the POMDP model, which is based on unobserved or hidden states, the PSR model is expressed entirely on observable quantities. Therefore, learning PSR models of dynamical systems from observation data should be easier and less prone to local minimum problems than learning POMDP models, and PSRs is potentially more compact than POMDPs. At the same time, the PSR-based model is more expressive than the nth-order Markov model [3]. It has been proved that the PSR state involves only the predictions of core tests, other than the predictions of all tests. The details will be discussed in Section 2.

Controlled dynamical systems can be divided into two types: systems with reset action and systems without reset action. In many real environments, a reset action may not be available, i.e., there is not an action that can be invoked by the learning algorithm to return the system back to the original initial condition [4]. Therefore, the only data used for obtaining the PSR model is a continuous trace of actions and observations. How to use such a data to build PSR model is an important research aspect of PSRs.

To date, the primary method for discovery and learning of predictive state representations in dynamical systems without reset is a Monte Carlo algorithm called suffix-history [4], [5]. The idea behind suffix-history is to take every time step of the training data as the start of a new sample, i.e., the history at every time step of the training data is treated as the null history, where the definition of history is similar to the definition of test in that they are both a sequence of action-observation pairs, but a history is constrained to start from the beginning of time and is used to describe the full sequence of past events. For example, when the training data is $S = a_{1} o_{1} a_{2} o_{2} \dots a_{n} o_{n}$ , the following training sequences can be obtained: $a_{1} o_{1} a_{2} o_{2} \dots a_{n} o_{n}, a_{2} o_{2} \dots a_{n} o_{n}, \dots, a_{n} o_{n}$ after using the suffix-history algorithm, then the prediction of test at a history can be estimated by using Monte Carlo approaches. The algorithm is described in detail in Ref. [6]. Potential problem of this algorithm is that the history identified by this algorithm may not be the actual history. As a consequence, the prediction vector at the history that is identified by this algorithm may be the weighted sum of prediction vectors at different actual histories. So, it is possible that the prediction vector beside the initial prediction vector is not correct, and only under some special conditions the PSR model obtained by this algorithm can reflect the true PSR model of the system [6]. In the work of Ref. [7] an interesting algorithm called constrained gradient algorithm is also proposed to address the problem of discovery and learning the PSR model in dynamical systems without reset, this algorithm takes advantage of several structural constraints in a valid prediction matrix to obtain the prediction of test at a history, which is different from the majority of existing PSR discovery and learning algorithms that use Monte Carlo prediction estimation [4], [8], [9], [10], [11], [12]. But just as mentioned in this paper, if using this method to obtain the prediction of test at a history, the threshold parameter for condition number can only be determined empirically, other than can be determined by using some rule when the predictions are estimated by Monte Carlo approaches [8]. Because such a threshold parameter is a key component in the process of discovering core tests, and without some rule it is very difficult to be determined, in this paper, we do not take further consideration on the work of Ref. [7] and the Monte Carlo approaches are used to estimate the prediction of test at a history.

The key component of discovery and learning of PSRs in dynamical systems without reset is how to identify the history at any time step in the training data, because when the history at any time step can be identified, it is straightforward to obtain the PSR model [4], [8]. In this paper, after obtaining the set of landmarks in a system and proving that different PSR models for a system, by using different state as the initial state, have the same PSR model parameters, we propose a method for identifying the history at any time step in the training data, and then the PSR model can be obtained. We start with a brief introduction to PSRs in Section 2. In Section 3, the problem of identifying the history at any time step and obtaining the PSR model is addressed. The experimental results are evaluated and compared in Section 4, and the conclusion is given in Section 5. In this paper, the system we are interested in is discrete-time, finite-observation, stochastic, and can be represented by a finite POMDP model.

Section snippets

Predictive state representations

As mentioned above, PSR-based models represent the state of a system as a vector of predictions for core tests. When a discrete set of observations $O = {o^{1}, o^{2}, \dots o^{| O |}}$ and a discrete set of actions $A = {a^{1}, a^{2}, \dots a^{| A |}}$ are given, the conditional probability of a length-m test $t = {a^{1} o^{1} a^{2} o^{2} \dots a^{m} o^{m}}$ at a length-n history h is defined as $p (t | h) = p (ht) / p (h) = \prod_{i = 1}^{n} \Pr (o^{i} | {ha}^{1} o^{1} \dots a^{i})$ [12]. A set of tests $Q = {q_{1}, q_{2}, \dots, q_{k}}$ constitutes a PSRs if its prediction vector, $p (Q | h) = {[p (q_{1} | h), p (q_{2} | h), \dots, p (q_{k} | h)]}^{T}$ is a sufficient

Preliminaries

In the previous work of James et al. [11] the definition of memory, landmark and the property of landmark were proposed.

Definition of memory: A sequence of alternating actions and observations and it ends with observation. A length-n history means that the number of action-observation pairs in this history is n, while a length-n memory means that the total number of actions and observations in this memory is n, and it is possible that a memory is not started from action or initial time.

Experiments and results

In this section, we tested the performance of our algorithm on a set of dynamical systems taken from Cassandra’s POMDP page [13] and these systems are commonly used in evaluating PSRs and POMDPs. For each system, a training data was generated, existing work shows how to obtain the training data using no-blind policy [12], however, since our objective is just to evaluate the effect of identifying the history using the method we proposed, we use blind policy, which is a uniform random policy over

Conclusion

We have demonstrated in this paper that any landmark can be used as the initial state. After that, the concept of landmark is used to identify the history, so, the history at any time step beside the null history can be identified, which were unsolved with current PSR discovery and learning algorithms in dynamical systems without reset, then the PSR model can be obtained by using only a continuous trace of actions and observations as the training data. Our empirical results on a standard set of

References (14)

S. Singh, M. James, M. Rudary, Predictive state representations: a new theory for modeling dynamical systems, in:...
L.P. Kaelbling, M.L. Littman, A.R. Cassandra, Planning and acting in partially observable stochastic domains, Artif....
M. Littman, R. Sutton, S. Singh, Predictive representation of state, in: Advances in Neural Information Processing...
B. Wolfe, M. James, S. Singh, Learning predictive state representations in dynamical systems without reset, in:...
D. Wingate, S. Singh, On discovery and learning of models with predictive state representations of state for agents...
M. James, Using Predictions for Planning and Modeling in Stochastic Environments, PhD Thesis, Department of Computer...
P. McCracken, M. Bowling, Online discovery and learning of predictive state representations, in: Advances in Neural...

There are more references available in the full text version of this article.

Cited by (9)

Learning and planning in partially observable environments without prior domain knowledge
2022, International Journal of Approximate Reasoning
Citation Excerpt :
However, a black box simulator or strong prior knowledge about the environment is still required to guarantee the performance of the POMCP and BA-POMCP approaches. Predictive State Representations (PSRs) offer a powerful framework for modeling partially observable dynamical systems [3,17,25–27]. Unlike latent-state based approaches, such as hidden Markov models and POMDPs, PSRs represent state as predictions about future observable events, which leads to easier learning of the corresponding model, the avoidance of using local-minima prone expectation maximization, more expressive power, etc [40].
Planning in stochastic and partially observable environments is a central problem in artificial intelligence. To address this issue, an accurate model or a black-box simulator of the environment is usually needed in the literature. Although some recent approaches have been proposed for learning optimal behaviors under model uncertainty, prior knowledge about the environment is still required to guarantee the performance of the proposed algorithms. With the benefits of the Predictive State Representations (PSRs) approach for state representation and model prediction, in this paper, we introduce an approach for planning under partial observability with no prior domain knowledge, where an offline PSR model is firstly learned and then combined with online Monte-Carlo tree search for planning under model uncertainty. Furthermore, we also showed that with the proposed framework, the PSR models learned via other techniques, e.g., the online PSR model learning approach, can be integrated straightforwardly. By comparing with the state-of-the-art approach of planning under model uncertainty, we demonstrated the effectiveness of the proposed approaches along with the proof of their convergence.
Solving partially observable problems with inaccurate PSR models
2014, Information Sciences
Citation Excerpt :
Singh et al. [36] developed an approximate gradient-based method to learn the PSR parameters assuming that the tests whose predictions constitute state are known a priori. In the work of [19,26,40], Monte Carlo approaches were adopted. Both the tests whose predictions constitute state and the PSR model parameters can be obtained.
Modeling dynamical systems is a commonly used technique to solve partially observable problems in the artificial intelligence field. Predictive state representations (PSRs) have been proposed as an alternative to partially observable Markov decision processes (POMDPs) to model dynamical systems. Although POMDPs and PSRs provide general frameworks to solve partially observable problems, they rely heavily on a known and accurate model of the environment. However, in real world applications it is extremely difficult to build an accurate model. In this paper, we propose an algorithm to solve partially observable problems using an inaccurate PSR model which is learned from samples. The proposed algorithm can also improve the accuracy of the learned model. Given the inaccurate PSR model, the PSR state is identified firstly. Then the traditional Markov decision processes (MDP) techniques are used to solve the partially observable problem. Furthermore, the learned model, which may get off-track as often happens when the model is learned from samples, can be reset. The effectiveness of our proposed algorithm is demonstrated based on a standard set of POMDP test problems.
Meta-reasoning for predictive error correction: Additional results from abstraction networks with empirical verification procedures
2014, Biologically Inspired Cognitive Architectures
Citation Excerpt :
In our EVP-based learning technique, the requirement for empirical verifiability of each chunk of domain knowledge is a requirement that the knowledge chunk make a verifiable prediction. In work on learning PSRs (Wiewiora, 2005; Yun-Long & Ren-Hou, 2009), the system can be seen as defining concepts relevant to action selection, as the predictive representations of state are constructed on-line. However, the tasks addressed with PSRs vs. EVP-based learning are substantially different.
In Jones and Goel (2012), we describe a meta-reasoning architecture that uses abstraction networks (ANs) and empirical verification procedures (EVPs) to ground self-diagnosis and self-repair of domain knowledge in perception. In particular, we showed that when a hierarchical classifier organized as an AN makes an incorrect prediction, then meta-reasoning can help diagnose and repair the semantics of the concepts in the network. Further, we demonstrated that if an EVP associated with each concept in the network can verify the semantics of that concept at diagnosis time, then the meta-reasoner can perform knowledge diagnosis and repair tractably. In this article, we report on three additional results on the use of perceptually grounded meta-reasoning for correcting prediction errors. Firstly, a new theoretical analysis indicates that the meta-reasoning diagnostic procedure is optimal and establishes the knowledge conditions under which the learning converges. Secondly, an empirical study indicates that the EVPs themselves can be adapted through refining the conceptual semantics. Thirdly, another empirical study shows that if EVPs cannot be defined for all concepts in a hierarchy, the computational technique degrades gracefully. While the theoretical analysis provides a deeper explanation of the sources of power in ANs, the two empirical studies demonstrate ways in which the strong assumptions made by ANs in their most basic form can be relaxed.
Detecting Changes and Avoiding Catastrophic Forgetting in Dynamic Partially Observable Environments
2020, Frontiers in Neurorobotics
Novel approach for the recognition and prediction of multi-function radar behaviours based on predictive state representations
2017, Sensors (Switzerland)
Making and improving predictions of interest using an MDP model
2017, Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS

View all citing articles on Scopus

View full text