Influence of the reward on the abstractions

The integration of multiple subsystems is a key to building of an autonomous system. In this contribution we consider interfaces between two subsystems: reward and abstractions. Both were thoroughly investigated, but to our knowledge mostly isolated from each other ((Oudeyer and Kaplan, 2007), (Pezzulo and Castelfranchi, 2007)). Our analysis leads to categorization of different abstraction types based on the influence of the reward and not on the content of representation. With help of gained overview we show how the integration of different abstraction types supports the ongoing development.


Introduction
The integration of multiple subsystems is a key to building of an autonomous system.In this contribution we consider interfaces between two subsystems: reward and abstractions.Both were thoroughly investigated, but to our knowledge mostly isolated from each other ( (Oudeyer and Kaplan, 2007), (Pezzulo and Castelfranchi, 2007)).
Our analysis leads to categorization of different abstraction types based on the influence of the reward and not on the content of representation.With help of gained overview we show how the integration of different abstraction types supports the ongoing development.

Categorization of abstraction types
We clarify first what we mean by abstractions.The system can observe its current state in sensorimotor space as well as causality either between timedelayed measurements (predictive model) or between measurements in different sensory channels (associative model).Both types of causality can be formalized as expectations.As the observation space O is huge, most implementations aim at clustering and discretization of this space: O ≈ N d d=1 O d , where O d are clusters in one of described above spaces (sensory, motor, sensorimotor, or causality) and N d is the number of necessary clusters.This is how the abstractions, i.e. action or perception primitives, come into play.We use the term abstractions for this initial stage in difference to the term symbol that implies existence of symbol-symbol links and grammar.

O stat d : statistical structuring of the system-environment interaction
The observations done by the system obviously depend on the behavior of the system (Lungarella and Sporns, 2005).If the behavior follows reward optimization, then it indirectly influences abstraction building.This statistical influence can be made more explicit, for example (Hart et al., 2006) refines the discretization O d if entropy of the behavior choice is high.

O dis
d : discrimination according to reward.
The reward can itself be a part of observation.
Then the observation space can be directly segmented according to the amount of the reward (Ishiguro et al., 1996).
Alternatively the reward or value function can be used as a teaching signal for clustering as discussed in (Körner and Matsumoto, 1998).

O opt d : optimization of abstractions with respect to reward
Optimization is a qualitatively different way how the reward can influence the building of abstractions, (Wolpert and Kawato, 1998).The authors assume that there exists an initial clustering of the sensorimotor flow into the predictive models and corresponding inverse models to produce a desired action.As a result of the reiteration of action optimization and differentiated model update, each pair of prediction and inverse models gets specialized to a particular context.

O opt 2
d : optimization of abstractions with respect to internal processing.
Finally, abstractions can be introduced as a result of the optimization of the operations to be executed on the abstraction layer, e.g.optimization of memorization or planning (Toussaint and Storkey, 2006).
Figure 1 schematically shows different types of abstractions discussed above.A natural question arises whether we should prefer one type to another.We believe that the discussion about the best suited abstraction type is contra-productive.Instead we investigate how one abstraction type supports the building of another type.

Incremental development
Ongoing development supposes that the system can build up on its abilities by creating favorable interaction, by skill optimization, and control reorganization in the sense of usage of learned regularities by higher metal functions (simulation, planning, at- .For the stability reasons it is important that tight coupling of abstractions to behavior, e.g.O opt d , occurs on later stages.The results of testing these ideas on real-world applications are summarized in (Mikhailova, 2010).

Conclusions
The appropriate knowledge representation is one of the hardest problems in the research on cognitive systems.In this contribution we did not ask what is the best way, instead we categorized different possibilities from the perspective of a complete system that monitors reward acquisition.We showed how one abstraction type supports the building of another in the sense of ongoing development.We hope to encourage more intensive discussion on integration of multiple types of learning started in (Balkenius, 1994).

Figure 1 :
Figure 1: Different types of abstractions according to the influence of the value system.Solid lines show direct influence and dashed lines indirect one.See the text for more details.