1 Introduction

More comprehensible artificial intelligence (AI) systems that are able to explain what they learn(ed) have gained more and more attention over the past years. In the dissertation Comprehensible Knowledge Base Extraction for Learning Agents – Practical Challenges and Applications in Games [1] (accepted at TU Dortmund University, Germany), this idea is considered in the context of learning agents with applications in games. The dissertation presents approaches that allow for the automated creation of knowledge bases from agent behavior. It aims especially at (1) the creation of human-readable knowledge for providing insights into what an agent has learned, and (2) the investigation of how learning agents can benefit from incorporating these approaches into their learning processes. Applications are presented, e.g., in the context of general video game playing (GVGAI). Moreover, approaches with potential for other domains have been implemented in the InteKRator toolbox.Footnote 1 The dissertation is mainly based on the publications [2,3,4,5,6,7,8,9,10]. A summary of contributions can be found in Table 1.

Table 1 Summary of contributions, selected occurred difficulties and challenges as well as features and results

After a rough overview over related works (Sect. 2), this abstract briefly presents how knowledge bases can be extracted from learned agent behavior (Sect. 3) and how agents can benefit from that with respect to challenges in games (Sect. 4). As a practical result, the InteKRator toolbox is presented (Sect. 5) and a conclusion with an outlook on future work is provided (Sect. 6).

2 Related Work

Works related to the presented approaches, can be roughly categorized into the following groups (see [1] for details):

  • learning approaches that are able to provide structural insights into what an agent learns

  • approaches that are geared towards a comprehensible representation of knowledge (i.e., that is not only compact but also easy to read and understand for humans)

  • learning/hybrid agent models in the context of games

  • systems similar to the InteKRator toolbox.

Representatives of the first group are, e.g., Bayesian (or other probabilistic) networks (e.g., [14], Section 8.2.2) or decision trees. Although there are methods to learn the structure of a Bayesian network [15], the structure is oftentimes provided in advance and most graphical approaches can become hard to visualize and read for larger amounts of nodes.

As for the second group, the concept of defaults (as described by Reiter in his Default Logic [16]) provides interesting properties for covering larger amounts of knowledge, since many common cases can be covered with a single default, leaving the remaining cases to few more specific rules. Answer Set Programming (ASP; see, e.g., [11]) allows for implementing these kinds of ideas by offering two different kinds of negation (strict vs. default). The approach presented here exploits these ideas on multiple levels of abstraction. Thereby, it allows for creating compact knowledge bases that were easier to read and comprehend by humans in comparison to ASP in the joint study [10]. In the presented approach, the rules are quantified by weights, which serves as an interface to machine learning approaches when learning such representations from data.

In the third group, the discipline of General Video Game Artificial Intelligence (GVGAI) [17], where agents must learn to play different (a priori unknown) video games, represents a challenging application domain. The GVGAI competition represents benchmarks for agent models in this context. Besides their sensory inputs, agents may be provided with a forward model of a game (i.e., a model that allows for extrapolating future states) or must learn the game mechanics themselves—in addition to good playing/winning strategies. While in the first case, known methods such as Monte Carlo Tree Search (MCTS) [12] can be directly applied, the methods in the dissertation focus on the latter case.

Concerning the fourth group, other systems exist that cover either machine learning or knowledge representation approaches. One representative of each group will be briefly considered in the following (both being Java-based like the InteKRator toolbox): Weka [18] is a collection that covers a large amount of different machine learning packages. However, Weka is not explicitly geared towards knowledge representation techniques. In contrast, the collection provided by the TweetyProject [19] comprises a large amount of mainly logic-based approaches. However, machine learning aspects are not in the focus there and only one of the packages is explicitly related to machine learning. Unlike these established collections, the InteKRator toolbox focuses on the lightweight usage of knowledge representation concepts in combination with learning approaches.

3 From Agent Behavior to Knowledge Base

The principle idea here is to describe the behavior that an agent learned in its environment in the form of formal knowledge. The knowledge describing the learned behavior should be both compact and general in a way that keeps it readable for humans while at the same time, allowing for generalizations. In contrast to classical approaches, such as first-order logic, the idea of default rules [16] as well as the default negation in ASP [11] allow for covering many common cases of an agent’s environment by focusing on the exceptional cases. Moreover, non-monotonicity is connected to games since games are highly dynamic environments: as an example, new levels may introduce large changes to the game play while an agent’s overall default behavior (e.g., reaching a goal) might still be useful. The approach presented here is based on the idea of rules with exceptions (and exceptions of exceptions, etc.) where every exception to a rule is a rule itself with a more specific premise. Thereby, a knowledge hierarchy from general to more specific rules is induced that can be read top-down to gain insights into what an agent has learned.

Different algorithms are presented in [1] to learn such representations from data, i.e., from sequences (or “traces”) of state-action pairs \(( st , a)\) that have been produced by an agent’s learning process. Every \(st\) is considered a conjunction \(s_1 \wedge \cdots \wedge s_n\) where every \(s_i \in \mathbb {S}_i\) is a value of the agent’s ith sensor.Footnote 2 The following example tries to provide an intuition of the main ideas (similar examples can be found, e.g., in [1] or [8]).

Example 1

An agent in a grid world with state space \(\mathbb {S} := \mathbb {S}_x \times \mathbb {S}_y\) and action space \(\mathbb {A} := \{\textrm{Left}, \textrm{Right}, \textrm{Up}, \textrm{Down}\}\) has learned to navigate around an obstacle, starting from a state \(st _\textrm{start}:= x_0 \wedge y_0\) to a destination state \(st _\textrm{dest}:= x_7 \wedge y_0\). The state-action sequence resulting from the agent’s trace through the grid world is assumed to beFootnote 3:

$$\begin{aligned} \begin{aligned} SA =&\langle ~ (x_0 \wedge y_0, \text {Up}), \ldots , (x_0 \wedge y_4, \text {Up}),\\&(x_0 \wedge y_5, \text {Right}), \ldots , (x_6 \wedge y_5, \text {Right}),\\&(x_7 \wedge y_5, \text {Down}), \ldots , (x_7 \wedge y_1, \text {Down})~\rangle \end{aligned} \end{aligned}$$

A knowledge base learned from \(SA\) can then look as follows:

$$\begin{aligned} \begin{aligned} KB =&\langle ~ \{\top \rightarrow \textrm{Right}~[0.41]\},\\&\hspace{2em} \{ x_0 \rightarrow \textrm{Up}~[0.83],\\&\hspace{2.5em} x_7 \rightarrow \textrm{Down}~[1.0]\},\\&\hspace{4.5em} \{x_0 \wedge y_5 \rightarrow \textrm{Right}~[1.0]\} ~\rangle \end{aligned} \end{aligned}$$

with annotated weights [w] representing conditional relative frequencies \(w:= P( conclusion \,|\, premise )\).

The algorithm used here to learn \(KB\) adds at first the topmost rule \(\top \rightarrow \textrm{Right}~[0.41]\) (since \(\textrm{Right}\) is the most common action in \(SA\)). Afterwards, it tries to find exceptions to cover as many cases as possible from \(SA\), resulting in the two rules for \(\textrm{Up}\) and \(\textrm{Down}\) on the second level. The second-order exception on the bottommost level is added last to cover the case of the grid world’s upper left corner (state \(x_0 \wedge y_5\)) where the agent moved \(\textrm{Right}\) (instead of \(\textrm{Up}\)). (For a more general explanation of the algorithm, see Sect. 5.)

Starting from the most general level, \(KB\) can be read topdown as: “Usually go to right, except when perceiving \(x_0\) then go up, or when perceiving \(x_7\) then go down, except when perceiving \(x_0 \wedge y_5\) then go right.

4 Benefit for Learning Agents

Apart from being used for explaining agent behavior, it has been investigated how learning agents can benefit themselves from incorporating knowledge base extraction approaches into their learning process. The aforementioned approaches have been integrated into two different agent models:

  • a reinforcement/Q-learning-based [20] model [4, 6]

  • an agent model learning a formal forward model of its environment that describes for a provided state \(st\) and an action a the expected subsequent state \(st '\) [2, 9].

In the first model, reinforcement learning is integrated with the extraction of a knowledge base from agent behavior during the learning process (cf. Example 1). In the context of different grid world scenarios as well as in a GVGAI game, it was shown that agents benefited already in an early stage of the learning process (\(\approx 10\) to \(15\%\) of the process, according to [1]) from relying their decisions on the extracted knowledge base (rather than on the weights learned through the underlying reinforcement learning approach). Such agents showed an increased learning speed over pure Q-learning in the experiments of [4, 6].

The second model allows an agent for learning forward models, which are then used to apply techniques such as Monte Carlo Tree Search (MCTS) [12]. Since many games are (near) real-time environments, games often have special performance requirements.Footnote 4 Moreover, in the GVGAI competition, agents are usually trained on certain levels of a game and then evaluated on other levels of the same game. To overcome these challenges, the agent model combines forward model learning from observational data with a revision approach: while learning a forward model in the training phase, in the evaluation phase, the learned forward model is revised when observing new effects that do not conform to the learned model. In our experiments [2, 9], the agent model was able to quickly learn human-readable forward models of GVGAI games based on which, e.g., MCTS were successfully applied, allowing the agent for learning to play several GVGAI games from scratch (performance videos can be found in the dissertation’s online appendix [21]).

5 The InteKRator Toolbox

To make the results of this work more accessible to a broader range of applications (also beyond the scope of agents), central concepts have been implemented in the InteKRator toolbox [13]. The toolbox allows for learning comprehensible knowledge bases from data, performing reasoning on these knowledge bases and revising the learned knowledge bases with new evidence. Moreover, InteKRator provides the possibility to check a learned (or manually modeled) knowledge base against a data set for measuring the quality of the knowledge. InteKRator has been proposed to be used in the medical domain [7] and has recently been used in research for cancer therapy recommendations [22]. The main functionalities of InteKRator will be briefly considered in the following:

Learning Algorithm Inputs are a data set of n input and 1 outcome column and the output is a learned knowledge base \(KB = \langle R_1,\ldots , R_{n+1} \rangle\) (cf. Example 1). It starts with the topmost level \(R_1 \in KB\) by adding a rule with an empty premise whose conclusion reflects the majority of the values contained in the outcome column. After that, on the next level, rules of premise length 1 are added, representing exceptions to the rule on the topmost level. This is continued successively, such that rules on a level \(R_j\) represent exceptions to the rules on the levels \(R_{j' < j}\) (i.e., levels above \(R_j\)).

Reasoning Inputs are a knowledge base \(KB\) and a set of (assumed) evident knowledge. The algorithm outputs the inference(s) that could be derived from \(KB\) with the help of the evident knowledge (optionally together with the rule(s) from which the inference(s) are derived). The reasoning algorithm searches the knowledge base upwards, starting on the bottommost level, for the most specific rule whose premise is satisfied by the provided evident knowledge and returns its conclusion. In case of multiple rules with the same weights are activated, all their conclusions are returned.

Revision Inputs are a knowledge base \(KB\) and the new knowledge (in the form of one or more input values and one outcome value). The output is the revised knowledge base \(KB '\). In case the outcome value cannot be derived from \(KB\) with the provided input values, the algorithm removes the rule providing the wrong conclusion. If the outcome still cannot be derived, a new rules is added to \(KB\) based on the provided input and outcome values. Although the algorithm can in principle revise on any level \(R_j \in KB\), only revision on the bottommost level \(R_{n+1}\) ensures that the new knowledge is incorporated without any side-effects.

Checking Inputs are a knowledge base \(KB\) and a data set, and the output are the percentages of data rows for which the outcome could be correctly derived from the input values.

By using the same knowledge base format and by providing a generic interface, these functionalities can be easily combined: learned knowledge bases can be revised and reasoning can be performed on the resulting knowledge bases.

6 Conclusion and Future Work

This dissertation abstract summarized how knowledge can be represented in a comprehensible way and how to learn such representations from (sensory) data. The major results of the dissertation comprise a comprehensible knowledge representation approach, a complete learning algorithm for learning knowledge bases from data as well as efficient revision and reasoning algorithms. These approaches have been used successfully for GVGAI research as well as for the implementation of the InteKRator toolbox to be used in further domains such as medical informatics. Future work could be, e. g., an investigation of when agents should revise extracted knowledge bases rather than relying on the underlying learning approach to quickly adapt to changes of an environment. Moreover, an extension of the inference approach, a study on its inference properties as well as the further development of the InteKRator toolbox could be interesting directions.