How Virtual Agents Can Learn to Synchronize: an Adaptive Joint Decision-Making Model of Psychotherapy

. Joint decision-making can be seen as the synchronization of actions and emotions, usually via nonverbal interaction between people while they show empathy. The aim of the current paper was (1) to develop an adaptive computational model for the type of synchrony that can occur in joint decision-making for two persons modeled as agents, and (2) to visualize the two persons by avatars as virtual agents during their decision-making. How to model joint decision-making computationally while taking into account adaptivity is rarely addressed, although such models based on psychological literature have a lot of future applications like online coaching and therapeutics. We used an adaptive network-oriented modelling approach to build an adaptive joint decision-making model in an agent-based manner and simulated multiple scenarios of such joint decision-making processes using a dedicated software environment that was implemented in MATLAB. Programming in the Unity 3D engine was done to virtualize this process as nonverbal interaction between virtual agents, their internal and external states, and the scenario. Although our adaptive joint decision model has general application areas, we have selected a therapeutic session as example scenario to visualize and interpret the example simulations.


Introduction
Whenever people come into contact with each other, they tend to spontaneously synchronize or align their nonverbal behavior, physiology and brain signals.The importance of such interpersonal synchrony has been established in multiple social settings.As an example, higher levels of nonverbal synchrony promote cooperation (Wiltermuth & Heath, 2009) and social affiliation (Hove & Risen, 2009).Furthermore, interpersonal synchrony may foster a good working relationship between clients and their therapist during psychotherapy (Koole, Tschacher, Butler, Dikker, & Wilderjans, 2020).A concept closely related to synchrony is facial mimicry.Facial mimicry refers to the matching of individuals' facial expressions with their emotional experiences (Drimalla et al., 2019).Indeed, facial mimicry is a central component within all social interactions (Fisher & Hess, 2017; Hess & Fischer, 2013).More broadly, mimicry is labeled as the matching or imitation of nonverbal behavior, and it can range from facial expressions including pupil dilation (Kret, Fischer, & De Dreu, 2015) to body postures (Chartrand & Bargh, 1999).Although such movement and facial mimicry (including emotional expressions) are linked to each other (Moody & McIntosh, 2011), they are not the same.Facial mimicry in itself carries information about the expresser's appraisal of the event (Hareli & Hess, 2012;van Kleef, 2009), that directly impacts the mimicry.In contrast, body movements themselves do not directly contain such appraisal information, although the receiver can infer an emotional state from such signals based on their own interpretation (Fischer & Hess, 2017).Facial mimicry plays a role for both emotional and cognitive empathy (Drimalla et al., 2019).
In accordance with the above, for joint decision-making processes the outcome consists of (1) a joint action, (2) a common positive feeling about this action and (3) an empathic understanding of this action and feeling (Treur, 2011).In other words, a successful joint decision-making process can be seen as the synchronization or alignment of both actions and positive feelings together with a mutual empathic understanding.Throughout this process, nonverbal interactions play an important role.Moreover, these alignment processes themselves can become attuned by a form of learning or adaptation.Computational modeling of these complex and dynamic processes by means of (virtualized) agents is challenging and has been addressed only partially.
In earlier work (Treur, 2011;Duell & Treur, 2012), non-adaptive joint decision-making based on non-verbal interactions has been modeled within and between agents.However, these models are not adaptive, and the simulations of these models have not been visualized by avatars.Other agent models that address emotion regulation processes (but not joint decision-making) have been successfully developed together with their accompanied avatars (De Jong et al, 2022).By virtualization of the agents, the human-likeness of the agent model can be better demonstrated and viewers can relate more strongly to the generated interaction patterns.Therefore, our aim here is to (1) extend the non-adaptive nonverbal joint decision-making model from (Treur, 2011) to an adaptive model and (2) visualize these adaptive joint decision-making processes by means of avatars.To verify the developed computational model, we conducted multiple simulation experiments and our main simulation was visualized by virtual agents displayed as avatars.As an illustrative visualized scenario, the focus was on the adaptive development of nonverbal closeness of contact between client and therapist over multiple therapy sessions as a central joint decision.
In this paper, Sect. 2 provides an overview of the background knowledge used to design the introduced model.Section 3 briefly summarizes the modeling approach used, after which the second-order adaptive network model is introduced in Sect. 4. In Sect.5, the main example simulation is discussed in more detail, including its visualization.Section 6 elaborates on some other simulation results with alternative parameter settings for different personal characteristics.In Sect.7 a final discussion is provided and the complete specification of the model is shown in the Appendix.

Background Knowledge
For the design of the introduced computational model, different types of neural or psychological mechanisms found in the literature have been used as building blocks.A number of these mechanisms are of a general nature, whereas other ones are more specifically related to joint decision-making.An overview of them is given in this section.

General psychological mechanisms as building blocks
Several general mechanisms from psychology and neuroscience were used as building blocks to create the internal structure of the virtual agents in order to achieve human-likeness.

Mirroring
The principle of mirroring describes that a person's preparation state for a certain action is also activated when the corresponding action of another person is observed.It can be explained by mirror neurons and mirroring links: the connection from the sensory representation state of the action of the other person to the preparation state of a person's own action (Iacoboni, 2008a).People rely on such mirroring in their nonverbal communication and therefore, mirroring should play an important role in the design of a joint decision process of virtual agents.

Emotion Integration
In addition to mirroring, a virtual agent needs to decide to conduct or not conduct a certain action.
Humans use a prediction loop to internally simulate the predicted effect of an action on their emotional response and feeling states (Damasio, 1994(Damasio, , 1999)).The latter states play an important role as 'somatic markers' for making decisions.First, internal simulation of the considered action takes place by a prediction link that activates a sensory representation of the predicted effect of the action.Next, an emotional response and feelings are generated by using links for emotion association to this predicted effect.For the latter, two types of loops can be used: body loops (involving expression of the emotional response and sensing this expression) and as-if body loops (internal simulation of the emotional response, without actually expressing it); e.g., (Boukouvala, 2017;Damasio, 1994;Damasio, 1999;Poppa and Bechara, 2018).In turn, such an associated feeling affects the preparation for the considered action, which makes it a cyclic causal pathway.

Self-Other Distinction
Furthermore, we have relied on control and self-other distinction (Iacoboni, 2008a, pp. 196 Courtney and Meyer, 2020).These involve neurons which are suggested to have a function in control (allowing or suppressing) action execution after preparation has taken place.In single-cell recording experiments with epileptic patients, cells were found that are active when the person prepares an own action that is executed, but shut down when the action is only observed.This finding leads to the hypothesis that these cells may be involved in the functional distinction between a preparation state generated in order to actually perform the action, and a preparation state generated to interpret an observed action (or both, in case of imitation).More specifically, this has been shown in work reported in research by Mukamel and colleagues (Mukamel et al., 2010;Fried, Mukamel, Kreiman, 2011); see also (Keysers and Gazzola, 2010;Iacoboni, 2008a;Iacoboni, 2008b;Iacoboni & Dapretto, 2006); see also (Iacoboni, 2008a, pp. 201-203).Some of the main findings are that neurons with mirror neuron properties were found in all sites in the mesial frontal cortex where recording took place (approximately 12% of all recorded neurons); half of them related to hand-grasping, and the other half to emotional face expressions.A subset of neurons was found that shows behavior that relates to execution of the action: they have excitatory responses during action execution and inhibitory responses during action observation (Iacoboni, 2008b, p. 30).In (Iacoboni, 2008a(Iacoboni, , 2008b;;Iacoboni and Dapretto, 2006), such types of neurons have been termed super mirror neurons, to indicate the control function they may have with respect to the execution of an action.Some of such cells are sensitive to a specific person, so that an observed action can also be attributed to the person that was observed (self-other distinction) (Iacoboni, 2008a, pp. 201-202).It is suggested that the types of social interaction seen in persons with an autism spectrum disorder can be related to reduced self-other distinction and control of imitation (Brass & Spengler, 2009;Hamilton et al., 2007).

Plasticity and Metaplasticity
Within the cognitive neuroscience literature, two types of (first-order) adaptation or plasticity are often considered, one for connection weights and one for intrinsic neuronal properties such as excitability thresholds; for example, see (Chandra and Barkai, 2018).In this paper, one example of a first-order adaptation principle is considered: Hebbian learning for connection weights.This is a well-known adaptation principle (addressing adaptive connectivity) that can be explained by: 'When an axon of cell A is near enough to excite B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased.'(Hebb, 1949), p. 62.This is sometimes simplified (neglecting the phrase 'as one of the cells firing B') to: 'What fires together, wires together' (Shatz, 1992;Keysers & Gazzola, 2014).
The 'plasticity versus stability conundrum' describes how an organism adjusts its plasticity over time in a context-sensitive manner (Sjöström, Rancz, Roth, & Hausser, 2008), p. 773.Under which circumstances and to which extent such plasticity actually takes place is controlled in socalled metaplasticity; e.g., (Abraham & Bear, 1996;Garcia, 2002;Robinson, Harper, & McAlpine, 2016;Sjöström, Rancz, Roth, & Hausser, 2008).In the aforementioned literature, various studies have shown how adaptation (as described, for example, by Hebbian learning) is modulated by accelerating the adaptation process or decelerating or even blocking it.Among the reported factors affecting plasticity in such a way are stimulus exposure, activation, previous experiences, and stress.For example, Robinson and colleagues noted that 'Adaptation accelerates with increasing stimulus exposure' (Robinson et al., 2016), p. 2. This indeed describes a form of metaplasticity that controls the speed of adaptation (learning rate).This principle for metaplasticity also has been applied in the present computational model.

Building blocks for the joint decision-making context
As mentioned in Sect. 1, a component of interpersonal synchrony regards (facial/emotional and movement) mimicry.Emotional mimicry is defined as the matching or imitation of each other's (facial) emotions.The Contextual Model of Emotional Mimicry relies on two assumptions, namely (1) a shared mind is the foundation of emotional mimicry (Oatley, 2015) and, (2) there is no mimicry of the facial movements themselves but of the meaning of these movements (Fischer & Hess, 2017).Concretely, the shared mind assumption (1) means that both the expresser and mimicker have a minimal affiliation and perspective in common that provoke the mimicry.Assumption (2) implies that although the mimicry itself goes automatically, emotional mimicry is goal-instead of stimulus-driven.In other words, emotional mimicry regards a top-down process that originates from prior experiences in social interactions regarding the interpretation of other's facial movement (Fischer & Hess, 2017).These theory and findings also mean that when individuals do not want to become emotionally closer, they will (unconsciously) not mimic the emotions of others.
Several studies have demonstrated a bidirectional association between facial mimicry and emotional empathy.Emotional empathy is described as the emotional reaction on and inference of another's emotional state, meaning that the emotional states of both persons are congruent, but also the ability to differentiate between the self and other (Eisenberg & Fabes, 1990).People who score higher on empathic traits mimic more consistently with each other (Dimberg, Andréasson, & Thunberg, 2011; Sonnby-Borgstrom, 2002).Conversely, a shared emotional state (contributing to emotional empathy) is promoted by the mimicking of each other's facial expressions (Stel, van Baaren, & Vonk, 2008).In sum, empathy and facial mimicry appear to be involved together during tasks.
A system closely related to the emotion system is approach behavior and its motivation.Approach motivation refers to the strong desire to move forward (Harmon-Jones et al., 2013; Koole, Veenstra, Domachowska, Dillon, & Schneider, in press) and, vice versa, avoidance motivation is the desire to move away.Both approach and avoidance behavior and their motivation can vary in their intensity (Harmon-Jones, Price, & Gable, 2012).Traits, moods and external stimuli can all elicit approach behavior (Harmon-Jones et al., 2013).Although it is often thought that approach motivation is linked to positive states (e.g., happiness) and, in contrast, avoidance motivation to negative states (e.g., sadness) (Watson, 2000), approach motivation can also be related to negative states like anger (Carver & Harmon-Jones, 2009).This implies that the approach motivation and the emotion system are two separate systems; they are connected to each other, but not the same.
There is also experimental evidence that the body posture of leaning forwards enhances approach motivation (Price & Harmon-Jones, 2011).In one experiment, approach motivation was indirectly measured through the relative left frontal cortical activity and three body posture conditions (leaning, upright and reclining) were created.It turned out that participants who leaned forward with their arms extended (a body posture that is normally used to grasp a desired object) displayed higher activation in the relative left frontal cortical area than those who were leaning backwards.In other research, leaning backwards inhibited approach-motivated anger (Harmon-Jones & Peterson, 2009; Koole, Veenstra, Domachowska, Dillon, & Schneider, in press).In sum, approach behavior in the sense of a body posture of leaning forwards characterizes (and thereby visualizes) the state of elevated approach motivation.Approach motivation in itself reflects the desire to move towards a goal and is therefore always involved during joint decision-making.In such joint decision-making, the (connected) separate emotion system and the linked empathy play a role as well.

Method: Self-Modeling Network Modeling
The designed virtual agents and their simulations are based on models of their internal mental processes.To achieve this, the adaptive network-oriented modeling approach presented in (Treur, 2019(Treur, , 2020) ) has been used to create a dynamic and adaptive interplay of mental states.Following (Treur, 2020), a network model is characterized by (here X and Y denote nodes of the network, also called states; they have time-dependent activation values X(t) and Y(t)):

• Connectivity characteristics
Connections from a state X to a state Y and their weights  X,Y

• Aggregation characteristics
For any state Y, some combination function c Y (..) defines the aggregation that is applied to the impacts  X,Y X(t) on Y from its incoming connections from states X

• Timing characteristics
Each state Y has a speed factor  Y defining how fast it changes for a given (aggregated) causal impact.
The following generic difference (or related differential) equations that are used for simulation purposes and also for analysis of such temporal-causal networks incorporate these network characteristics ω X,Y , c Y (..), η Y in a standard numerical format: (1) ( +  ) = () + [  ( 1,  1 (),…, ,   ()) -()]   for any state Y and where to are the states from which Y gets its incoming connections.Within  1   the dedicated software environment described in (Treur, 2020, Ch. 9), a large number of currently around 50 useful basic combination functions are included in a combination function library.The above concepts enable to design network models and their dynamics in a declarative manner, based on mathematically defined functions and relations.The examples of basic combination functions that are applied in the model introduced here can be found in Table 1.

Notation Formula Parameters
Advanced logistic sum Realistic network models are usually adaptive: often not only their states but also some of their network characteristics change over time.By using a self-modeling network (also called a reified network), a similar network-oriented conceptualization can also be applied to adaptive networks to obtain a declarative description using mathematically defined functions and relations for them as well; see (Treur, 2020).This works through the addition of new states to the network (called selfmodel states) which represent (adaptive) network characteristics.In the graphical 3D-format as shown in Section 4, such additional states are depicted at a next level (called self-model level or reification level), where the original network is at the base level.
As an example, the weight ω X,Y of a connection from state X to state Y can be represented (at a next self-model level) by a self-model state named W X,Y .Such states are generally called W-states.For Hebbian learning (Hebb, 1949) this self-model state is connected as shown in Fig. 1.By using the function hebb  (..) from Table 1 as combination function, based on generic difference equation (1), this self-model state W X,Y models Hebbian learning as its behavior.As the outcome of such a process of network reification is also a network model itself, as has been shown in detail in (Treur, 2020, Ch 10), this self-modeling network construction can easily be applied iteratively to obtain multiple orders of self-models at multiple (first-order, second-order, etc.) self-model levels, for example, to model metaplasticity as discussed in Sect.2; e.g., (Abraham, and Bear, 1996).For instance, a second-order self-model may include a second-order self-model state H WX,Y representing the speed factor  WX,Y for the (learning) dynamics of first-order self-model state W X,Y which in turn represents the adaptation of connection weight  X,Y ; see Fig. 2 for the connectivity.Such states are generally called H W -states; they can be considered learning rates of the learning modeled by the concerning states W X,Y .This can be used to model the second-order adaptation principle 'Adaptation accelerates with increasing stimulus exposure' (Robinson, Harper, McAlpine, 2016) discussed in Section 2. As combination function for such H W -states the function alogistic , (..) from Table 1 can be used.This second-order adaptation can be interpreted as a context-sensitive form of control over the first-order adaptation: the plasticity only occurs in specific (relevant) contexts.
The two cases of self-model states depicted in Fig. 1 and Fig. 2 have been used in the model discussed in Sect. 4 in order to obtain a second-order adaptive network model for joint decisionmaking applying context-sensitive control of Hebbian learning.

4
The Adaptive Network Model for Joint Decision-Making Recall from Sect. 1 that our research aims are to (1) extend the non-adaptive nonverbal joint decision-making model from (Treur, 2011) to an adaptive model and ( 2) visualize these adaptive joint-decision-making models by avatars.The current section addresses (1) whereas ( 2) is addressed in the next section.
The adaptive joint decision model introduced here consists of three levels: a base level, a firstorder self-modeling level to model plasticity and a second-order self-model level to model metaplasticity to control the plasticity.For the base level it takes the nonadaptive model from (Treur, 2011) as a point of departure; see also (Van Ments & Treur, 2021).The first-order self-modeling level and second-order self-model level are added here.The overall network connectivity for the base level of a single agent is displayed in the blue (base) plane in Fig. 3.A novelty introduced here is that we made the joint decision model second-order adaptive, meaning that the aspect of learning and the control over it is incorporated within the model.To achieve this form of adaptivity, we relied on Hebbian learning modeled by self-model W-states at the middle level (first-order self-modeling level) above the base level (the green plane in Fig. 3).On top of that, we added H W -states, which are in turn controlling the learning rates for the Hebbian learning.These H W -states are at the third level of the model (second-order self-modeling level) displayed by the pink plane in Fig. 3.In Fig. 4, the interaction connections between the two agents are displayed.The literature briefly discussed in Sect. 2 has been used as a basis for the neurologically inspired adaptive network model presented here:  Decision-making is based on emotions associated to predicted effects of action options  Both the tendency to go for an action and the associated emotion are transferred between agents via mirroring processes using internal simulation  These mirroring processes at the same time induce a gradual process of mutually tuning the considered actions and their associated emotions  The outcome of such a joint decision process in principle involves three elements: o a common action option o a shared positive feeling for the effect of this action option o mutual empathic understanding for both the action and the feeling In the network model, s denotes a stimulus, ac an option for an action to be decided about, and e the effect of the action.The effect state e has an associated feeling state bo to it, which is considered to be positive for the agent.So, s, ac, e, bo, are parameters for stimuli, actions, effects, and body states, and B is a variable for agents; multiple instances of each of them can occur.The states used in the model are summarized in Table 1.
The network model uses ownership states for actions ac and their related effects e, both for self and other agents, indicated by os B,s,ac,e with B another agent or self (see Fig. 3).In addition, ownership states are used for emotions indicated by body state bo, both for self and other agents, specified by os B,e,bo with B another agent or self.As an example, the four arrows to os B,s,ac,e in Fig. 3 show that an ownership state os B,s,ac,e is affected by the preparation state ps ac for the action ac, the sensory representation srs bo of the emotion bo associated to the predicted effect e, the sensory representation srs s of the stimulus s, and the sensory representation srs B of agent B.
Prediction of effects of prepared actions is modelled using the connection from the preparation ps ac of the action ac to the sensory representation srs e of the effect e. Suppression of the sensory representation of a predicted effect of a self-initiated action is modelled by the (inhibiting) connection from the self-ownership state os Self,s,ac,e to sensory representation srs e ; e.g., (Moore & Haggard, 2008).The control exerted by the self-ownership state for action ac is modelled by the connection from os Self,s,ac,e to es ac .Displaying ownership for an action (a way of expressing recognition of the other agent's states, as a verbal part of showing empathic understanding) is modelled by the connection from the other-ownership state os B,s,ac,e to the communication effector state ec B,s,ac,e .Similarly, displaying of ownership for an emotion associated to effect e indicated by bo is modelled by the connection from the other-ownership state os B,e,bo to the communication effector state ec B,e,bo .Preparation for action a is affected by:  the sensory representation of stimulus s  the body state bo for the emotion associated to the predicted effect e of the action  observation of the action (tendency) in another agent The first bullet is an agent-independent external trigger for the action.The second bullet models the impact of the emotion bo associated to the action effect e.The third bullet models the mirroring effect for the action as observed as a tendency in another agent.This is similar for the preparation of a body state bo; here the sensory representation of the (predicted) effect e serves as a trigger, and the emotion state of another agent is mirrored.
Ownership states for an action ac or body state bo keep track of an agent B's context with respect to the action or body state.This context concerns both the agent self and the other agents; it is a basis for attribution of an action or emotion to an agent and thus covers self-other distinction.Moreover, a self-ownership is used to control execution of prepared actions or body states.For example, in case the agent B is self, the ownership state for action ac strengthens the initiative to perform action ac as a self-generated action: executing a prepared action depends on whether a certain activation level of the ownership state for the agent self is available for this action.This is how control over the execution of the action (like a go/no-go decision) is exerted and can, for example, be used to veto the action in a stage of preparation.Expression of ownership of the other agent to the other agent represents acknowledgement of an agent that it has noticed the state of the other agent: a verbal part of an empathic response.These communications depend on the otherownership states.
Plasticity was modeled using Hebbian learning (Hebb, 1949) for two mirroring connections for each agent:  for the mirroring connection from an agent A's sensory representation state srs B,ac,A of a agent B's action ac (tendency) to preparation state ps ac,A of A for ac  for the mirroring connection from an agent A's sensory representation state srs B,bo,A of B's emotional body state bo to preparation state ps bo,A of A for the same emotion bo This form of learning was modeled by the first-order self-model states W srsB,ac,A,psac,A and W srsB,bo,A,psbo,A in the middle plane; they use the combination function hebb  (see Table 1).Via the upward connections from srs B,ac,A and ps ac,A to W srsB,ac,A,psac,A it is monitored whether these base states are 'firing together' (see Sect. 2).Accordingly, the value of self-model state W srsB,ac,A,psac,A is updated using the W-state's combination function hebb  (…), thus obtaining 'wiring together'.The resulting value of W srsB,ac,A,psac,A is used by ps ac,A via the downward connection of W srsB,ac,A,psac,A to ps ac,A .Similarly, the Hebbian learning mechanism for the other adaptive connection concerning mirroring of emotions was modeled.By making the weights of these mirroring connections adaptive based on Hebbian learning, over time the agents get more responsive to each other and due to that it will become easier for them to reach a joint decision than for the nonadaptive case described in (Duell and Treur, 2012;Treur, 2011).
However, we did not assume that plasticity always occurs no matter what.Instead, we assumed that the extent of plasticity is context-sensitive, which is a much more realistic assumption; e.g., (Abraham & Bear, 1996;Robinson et al, 2016;Sjöström et al, 2008), see also Sect. 2. To model this, second-order self-model states H W srsB,ac,A,psac,A and H Wsrs B,bo,A,psbo,A (and similarly for the other agent's W-states) were added that represent the adaptation speed (learning rate) of the W-states.They affect the adaptive dynamics of the W-states through the downward (pink) connections to them.No plasticity occurs when these H W -states have value 0, and the higher their values, the higher the adaptation speed.In this way, the second-order adaptation principle 'Adaptation accelerates with increasing stimulus exposure' (Robinson et al, 2016) was modeled (see also Sect.2).More specifically, the upward connections from base states srs B,ac,A and ps ac,A to the related H W -state monitor the exposure at the base level and adapt the level of the H W -states accordingly.To this end, the H W -states use a common logistic combination function, which is monotonically increasing.
The full specification of the model in terms of role matrices that can be directly executed (thus supporting reproducibility) can be found in the Appendix Section at the end of the paper.

Simulation Results of the Main Scenario Including their Visualizations
In this section the results of our main simulated example scenario and its visualization by avatars are discussed.

Example scenarios
A wide variety of individual and situational differences can be observed in the real world.Accordingly, there are many possible outcomes for joint decision-making (Duell & Treur, 2012).From a modeling perspective, all these differences can be captured by different settings for the network characteristics ω X,Y , c Y (..) and η Y defining the model.For example, if an agent shows poor mirroring, this may be due to weak internal mirorring links (agent characteristic) or just because it is almost dark so that visibility of the other agent is poor (situational characteristic).In the former case, this can be modeled by giving the internal mirroring connection (see Fig. 3) a low weight, whereas in the latter case the inter-agent connection (see Fig. 4) can be given a low weight.In such a way the variety of different combinations of values for the network characteristics ω X,Y , c Y (..) and η Y can reflect or match the variety of individual and situational differences in the real world.
In the first scenario, outlined in the current section, we have chosen consecutive therapeutic sessions as the repetitive stimulus s representing the therapy context.Agent A is visualized as a male client and agent B as a female therapist.The central joint decision during these therapeutic sessions regards approach behavior, represented by leaning forwards and backwards.In our example visualizations, a key difference between the therapist and client regards that the therapist (agent B) is relatively eager to conduct the specific action closeness of contact and the client initially does not want this closeness in the contact.This action approach motivation serves as a visualization for the es ac state of each agent.Moreover, the facial expression that goes from neutral (most negative affect) to smiling (most positive affect) regards the visualization of the es bo state of each agent.This body state es bo shows how both therapist and client feel about the decision over time.
The three scenarios considered in this paper in Sect. 5 and 6 vary on the responsiveness upon the external stimulus s for the closeness of the contact between therapist and client; ws s represents the therapy sessions, it has value 1 during a session and 0 when no session occurs at that moment; see Figs. 3 and 4. The differences in this stimulus responsiveness (or eagerness) have been modeled by differences in weights of the connection from stimulus representation to response preparation: the weights  srss,A,psac,A and  srss,B,psac,B of the connections from stimulus representations srs s,A and srs s,B to the respective response preparations ps ac,A and ps ac,B of the client A and therapist B. In particular, variations in this responsiveness strength from each side for this closeness were made as shown in Table 3.In this table, the numbers refer to the weights of these connections from the stimulus representation srs s to the closeness action preparation ps ac .Here, in Scenarios 1 and 3 it is assumed that the therapist is experienced and has a high initial responsiveness as part of her professional repertoire.For example, in Scenario 1 addressed in the current section, while this responsiveness of therapist is high (weight 1), the weight of the corresponding connection of the client is low (weight 0.1).This is based on the assumption that over the years the therapist has become experienced in responding to the type of stimuli during therapy sessions.In contrast, therapy sessions are assumed to be new for the client.Note that the above only concerns the responsiveness upon the general therapy context stimulus s.
In addition, during the sessions also responsiveness upon (dynamic) signals that are exchanged between therapist and client plays an important role, which in particular takes place in mirroring.
While the aforementioned connections from stimulus s to response preparations are assumed nonadaptive, the mirroring connections were modeled as adaptive and therefore can and preferably will strengthen within and over sessions.
The overall views on the simulations for the three scenarios indicated in Table 3 are depicted in Fig. 5.As this is not easy to read, in subsequent pictures in Sects 5.3 and 5.4, parts of them will be shown according to specific views in order to illustrate different phenomena that occur.Nevertheless, in Fig. 5 it roughly can be seen that:  in Scenario 1 (upper graph) with a therapist that is highly responsive for the therapy context there is some breakthrough in the fourth session,  in Scenario 2 with a weakly responsive therapist (middle graph) there is no breakthrough at all,  in Scenario 3 (lower graph) where the therapist and client are both responsive there is already a breakthrough in the second session.
What exactly such a breakthrough is, will be explained in Sect.5.3 for Scenario 1 and in Section 6 for the other two scenarios.

Visualization method
The scenarios are visualized in Unity using the free assets (A2 Games, 2020; A2 Games, 2021) for the male and for the female agents, respectively.In each scenario the female agent represents the therapist whilst the male agent represents the client.The assets from the room in the background were provided by the Unity asset (DevDem, 2020).The room's design was slightly changed from the provided example to give it more of a therapy room look.Animations were either included with the character assets, or were taken from (Adobe, 2008).Code that controls the flow of the sessions and controls the characters was written by us.The end time of this example simulation equaled 700 and the step size was 0.5.In Fig. 5 and 6 some of the simulation results are depicted.As displayed in Fig. 5, we see that the interval of both stimulus and non-stimulus periods equals 50.Each stimulus interval regards a single therapeutic session.

Scenario 1: Strong responsiveness of therapist, weak responsiveness of client
In this first scenario, the therapist has strong responsiveness to therapy context stimulus s (weight 1) and the client has low responsiveness (weight 0.1).In this scenario, from the first therapeutic session onwards, the therapist starts to execute a high level of closeness in contact and also has a high feeling body state (good feeling) about this closeness of contact; see the upper graph in Fig. 5 for the overall simulation and Fig. 6 to 8 for specific views.The dashed lines refer to states of agent B (the therapist), the solid lines to agent A (the client).Corresponding states between the two agents have the same color from Fig. 6 onwards.In contrast, the client does not display any closeness of contact or happy feeling at all during the first three therapeutic sessions.However, in the fourth therapeutic session, there is a breakthrough, in which the client starts feeling better about the closeness in contact with the therapist and almost immediately afterwards starts to execute this closeness in contact too.At this point the adaptivity based on Hebbian learning has made the mirroring connections strong enough to achieve a joint decision, which was not possible with the initial settings.In the next therapeutic sessions, both therapist and client are close in their contacts and feel good about this closeness in the therapeutic relationship (joint decision).One of the alternative simulations is Scenario 3 (see the lower graph in Fig. 5 and also more specific views in Fig. 13 and 14), where it can be seen that the breakthrough moment already happens in the second stimulus (therapeutic session) episode due to the higher responsiveness of the client.From the first therapy session on, almost all states of the therapist become activated within each session (Fig. 7).Nevertheless, the sensory representation states based on observing ac and bo of the client stay low until the fourth session as the client does not express them earlier.In between sessions, the states of the therapist do not (completely) vanish.

The patterns of plasticity
In Fig. 8 it is shown how the adaptation processes take place based on the W-states for (Hebbian) plasticity (pink and purple lines) and H W -states for metaplasticity (blue and green lines).Regarding the first-order adaptability, the W-states from the client start to increase after the first session.From the first session onwards, the therapist shows activation of expression of both ac and bo, which are sensed by the client (see Fig. 8).As can be seen, the mirroring links strengthen by Hebbian learning during the sessions after session 1: for action ac slow (purple line) and for feeling bo a bit faster (pink line).This goes hand in hand with an increase in the preparation states for ac and bo, but not with the execution states to express or execute the action ac or feeling bo; the latter states stay low during the period of three sessions (see Fig. 8).
During the fourth session, as a breakthrough both the W-states and the execution states become high; see the green for ac and bordeaux red for bo lines in Fig 8 .Regarding the therapist (dashed lines), both the W-states for ac and bo tend to decrease slightly until the fourth session (see Fig. 8).This has two reasons: (1) due to the lack of sensing of activations of the client, there are no activations of the connected states, (2) there is no perfect persistence, as the persistence factors  are 0.995 and not 1.This means that extinction takes place: per time unit 0.5% of the learnt value is lost.After the breakthrough, the W-states of the therapist do increase sharply because then the therapist senses very high execution values of ac and bo from the client, so that the own connected states become strongly activated.These patterns show how during successive therapy sessions a learning process strengthens the mirroring within the client, which has to reach a certain level before it becomes visible in the action execution and emotion expression.

The patterns of metaplasticity
Learning itself manifests differently depending on circumstances.In this case the learning depends on the exposure to activation, according to the following metaplasticity principle discussed earlier: 'Adaptation accelerates with increasing stimulus exposure' (Robinson et al, 2016).This metaplasticity principle is modeled by the H W -states.For the client, these H W -states are depicted by the solid green and light blue lines (for bo and ac, respectively) in Fig. 8.As can be seen, the H Wstate for ac follows the therapy sessions as during these episodes the therapist's actions are sensed.The H W -state for bo follows a different pattern as the emotion-related states (srs, ps, es states) have a less variable tendency over time.Note that the breakthrough in the fourth session goes together with a steep increase of H W -states.The H W -states of the therapist follow a similar pattern, although there are some slight differences.

The virtualisations
For the virtualizations, see Fig. 9 and 10.The feeling is visualized through the facial expressions and the closeness in contact as leaning backwards and forwards.In this scenario (see Fig. 11 and 12) both client and therapist have a low responsiveness upon the general therapy context s: both have weight 0.1 for the stimulus-response connection for closeness of contact.This models, for example, a therapist who has not much experience yet, or for other reasons is not able to be responsive upon the therapy context.Some of the states reach values around 0.9, but there are very few of them and they all directly relate to the stimulus s: only the sensing state ss s,A and ss s,B for both agents, and the sensory representation states srs s,A and srs s,B ; see Fig. 11.Regarding the plasticity, as can be seen, no learning takes place: all W-states remain constant at 0.2 all the time, also within the therapy sessions.This happens because the adaptation speed represented by the H W -states equals 0. The dominant impact on a H W -state is via its incoming connection with negative weight -0.5 from the related W-state, which overrules the positive impact from the other incoming connections.All other states stay below an activation value of 5 * 10 -3 ; see Fig. 12, where solely the preparation and predicted effect sensory representation states for bo and ac are displayed.All other states still follow a similar pattern being slightly activated during the therapy sessions but with still lower activation levels < 5 * 10 -5 .

Scenario 3: Strong responsiveness of therapist, moderate responsiveness of client
In this scenario the client has a higher responsiveness to the therapy context: value 0.7 for the weight of the connection from the srs s states to ps ac states; see Fig. 13 to 15.Compared to Scenario 1, it can be seen that now the adaptation within the client goes much faster so that already in the second session a breakthrough is achieved.For the rest, the pattern is similar to that of Scenario 1.

Discussion and Conclusions
Our aim was to simulate an adaptive joint decision-making process as a specific form of synchrony between two persons modeled as virtual agents and visualize both the execution of the action and body states (feeling) of each virtual agent.To illustrate the approach, we have used a therapeutic setting and the joint decision regarding the closeness of contact between therapist and client.From our simulations, it turned out that, initially, it was not possible to reach a joint decision -namely close contact -but after a number of sessions it was possible, mainly due to learning on the side of the client.Our model was made adaptive using a Hebbian learning principle applied to the four mirroring connections (for both actions and emotions) for the two virtual agents (Hebb, 1949).Due to this, the agents get more responsive to each other over time and so that it will become easier for them to reach a joint decision than for the nonadaptive case described in (Duell and Treur, 2012;Treur, 2011).
To make it more realistic, the learning speed itself was not assumed to be constant but was made adaptive in a context-sensitive manner to model metaplasticity (Abraham and Bear, 1996).The modeling approach used is based on states with time-varying activation levels that were connected to each other through temporal-causal relationships.There are sensing, internal and execution states.
Using the self-modeling network modeling option, the first-and second-order adaptivity was modeled based on the same temporal-causal network modeling principles (Treur, 2019(Treur, , 2020)).
In our previous work (Hendrikse, Treur, Wilderjans, Dikker, & Koole, 2022) we developed a computational model that addressed synchrony between two agents.However, there are three main differences with the current paper: (1) this previous model did not consider the more complex internal mental processes that play a role in joint decision-making, (2) it was not adaptive, and (3) no visualization by virtual agents was developed.
On the basis of three illustrative simulations, we conclude that we succeeded to model an adaptive joint decision process with an application in a human-like situation, namely a therapeutic setting.These findings might serve as a foundation for the development of virtual support for therapies in the future.This research opens a number of directions for further research.First, our agent models relied solely on nonverbal communication, but also language plays an important role in humans' communication.Therefore, future agent models could be extended to verbal communication.Second, we only visualized one body state, facial expressions that ranged from neutral to positive.The emotional response system is much more differentiated in humans and the distinction between, for example, different approach-motivated states like attraction and anger was beyond the scope of the current research.Future agent models could incorporate more emotions like anger, disgust and sadness.Third, it is also possible to let human participants verify the realism of the expressed actions and emotions (such as closeness and happiness) in the visualizations.

Fig. 1 .
Fig. 1.Connectivity of a self-model for Hebbian learning Similarly, all other network characteristics from ω X,Y , c Y (..) and η Y can be made adaptive by including self-model states for them.For example, an adaptive speed factor η Y can be represented by a self-model state named H Y .Such states are generally called H-states.As the outcome of such a process of network reification is also a network model itself, as has been shown in detail in(Treur, 2020, Ch 10), this self-modeling network construction can easily be applied iteratively to obtain multiple orders of self-models at multiple (first-order, second-order, etc.) self-model levels, for example, to model metaplasticity as discussed in Sect.2; e.g.,(Abraham, and Bear, 1996).For instance, a second-order self-model may include a second-order self-model state H WX,Y representing the speed factor  WX,Y for the (learning) dynamics of first-order self-model state W X,Y which in turn represents the adaptation of connection weight  X,Y ; see Fig.2for the connectivity.Such states are generally called H W -states; they can be considered learning rates of the learning modeled by the concerning states W X,Y .This can be used to model the second-order adaptation principle 'Adaptation accelerates with increasing stimulus exposure'(Robinson, Harper, McAlpine, 2016) discussed in Section 2. As combination function for such H W -states the function alogistic , (..) from Table1can be used.This second-order adaptation can be interpreted as a context-sensitive form of control over the first-order adaptation: the plasticity only occurs in specific (relevant) contexts.The two cases of self-model states depicted in Fig.1and Fig.2have been used in the model discussed in Sect. 4 in order to obtain a second-order adaptive network model for joint decisionmaking applying context-sensitive control of Hebbian learning.

Fig. 2 .
Fig.2.Connectivity of a second-order self-model for the second-order adaptation principle 'Adaptation accelerates with increasing stimulus exposure' with a first-order self-model for Hebbian learning.

Fig. 3 .
Fig. 3. Connectivity for the network architecture of a single agent A.

9 Fig. 5 .
Fig. 5. Overall view on the three simulation scenarios.Upper graph: Scenario 1 with breakthrough in the fourth session.Middle graph: Scenario 2 without any breakthrough.Lower graph: Scenario 3 with breakthrough in the second session.A breakthrough can be seen as the es ac and es bo of the client that go upwards.

Fig. 6 .
Fig. 6.Scenario 1: simulation case with only the expression and execution states (es bo and es ac ) for emotion bo and for action ac chosen for the virtualization.It can be seen that the therapist immediately has an open expression and stance, while the client takes a few sessions to open up.

Fig. 7 .
Fig. 7. Scenario 1: the representation and preparation states indicated by srs and ps together with the expression and execution states

Fig. 8 .
Fig. 8. Scenario 1: the adaptation H-states and H W -states from the first-and second-order self-model level.

Fig. 9 .
Fig. 9.In this screenshot, the therapist (left) and client (right) walk to their seats in order to start a new session.

Fig. 10 . 6 Simulation Results for the Alternative Scenarios 2 and 3 6. 1 Scenario 2 :
Fig. 10.Screenshots taken of the visualization of Scenario 1 for three sessions (from upper to lower) with the therapist (left) and client (right).The therapist has an active posture and happy expression soon after the beginning, in contrast to the client who has to develop that over a number of sessions.

Fig. 11
Fig.11Scenario 2: all states that do not stay close to 0. Client and therapist both have low responsiveness to the therapy context: the connections from the srs s states to ps ac states have weight 0.1.The states for therapy context stimulus s and the W-and H W -states modeling plasticity and metaplasticity.The W-states are constant, so no learning takes place; the H W -states stay very close to 0.

Fig. 12
Fig. 12 Scenario 2: the representation and preparation states.Note that the values of all states depicted here are very low: below 5 * 10 -3 .

Fig. 13 Fig. 14
Fig.13 Scenario 3: general therapy context stimulus s and the actions and expressions of both persons as used in the visualizations.Client (0.7) and therapist (1.0) connections from srs s to ps ac for responsiveness have high weights.Roughly a similar pattern as for Scenario 1 but much faster: breakthrough already in session 2.

Fig. 15 .
Fig. 15.Scenario 3: the self-model states modeling plasticity and metaplasticity.Both client and therapist have a high responsiveness upon the general therapy context: client weight 0.7 and therapist weight 1.0 for the connection from the srs s states to the ps ac states.

Table 2
States and their explanation Sensory representation state for B tending to do action ac perceived by agent A X srs A,ac,B Sensory representation state for A tending to do action ac perceived by agent B X srs B,bo,A Sensory representation state for body state bo of B by agent A X srs A,bo,B Sensory representation state for body state bo of A by agent B X srs A,bo,A Sensory representation state for own body state bo of A by agent A X srs B,bo,B Sensory representation state for own body state bo of B by agent B X ps ac,A Preparation state for action ac by agent A X ps ac,B Preparation state for action ac by agent B X ps bo,A Preparation state for emotional response bo by agent A X ps bo,B Preparation state for emotional response bo by agent B X os B,s,ac,e,A Fig. 4.Network architecture of the interaction between the two virtual agents, see also(Van Ments and Treur, 2021) Other-Ownership state for doing action ac in the context of B, s and e by agent A X os A,s,ac,e,B Other-Ownership state for doing action ac in the context of A, s and e by agent B X os B,e,bo,A Other-Ownership state for emotion bo in the context of B and e by agent A X os A,e,bo,B Other-Ownership state for emotion bo in the context of A and e by agent B X os A,s,ac,e,A Self-Ownership state for doing action ac in the context of A, s and e by agent A X os B,s,ac,e,B Self-Ownership state for doing action ac in the context of B, s and e by agent B X os A,e,bo,A Self-Ownership state for emotion bo in the context of A and e by agent A X os B,e,bo,B Self-Ownership state for emotion bo in the context of B and e by agent B X ec B,s,ac,e,A Communication of action ac in the context of B, s and e by agent A X ec A,s,ac,e,B Communication of action ac in the context of A, s and e by agent B X ec B,e,bo,A Communication of emotion bo in the context of B and e by agent A X ec A,e,bo,B Communication of emotion bo in the context of A and e by agent B X es ac,A Execution state of action ac by agent A X es ac,B Execution state of action ac by agent B X es bo,A Execution state of body state bo by agent A X es bo,B Execution state of body state bo by agent B X W srsB,ac,A,psac,A First-order connectivity self-model state representing the weight  of the connection from srs B,ac,A to ps ac,A X W srsA,ac,B,psac,B First-order connectivity self-model state representing the weight  of the connection from srs A,ac,B to ps ac,B X W srsB,bo,A,psbo,A First-order connectivity self-model state representing the weight  of the connection from srs B,bo,A to ps bo,A X W srsA,bo,B,psbo,B First-order connectivity self-model state representing the weight  of the connection from srs A,bo,B to ps bo,B X H W srsB,ac,A,psac,A Second-order timing self-model state representing the speed factor (learning rate)  of the adaptive weight  of the connection from srs B,ac,A to ps ac,A X H W srsA,ac,B,psac,B Second-order timing self-model state representing the speed factor (learning rate)  of the adaptive weight  of the connection from srs A,ac,B to ps ac,B X H W srsB,bo,A,psbo,A Second-order timing self-model state representing the speed factor (learning rate)  of the adaptive weight  of the connection from srs B,bo,A to ps bo,A X H W srsA,bo,B,psbo,B Second-order timing self-model state representing the speed factor (learning rate)  of the adaptive weight  of the connection from srs A,bo,B to ps bo,B

Table 3
Variations of the stimulus responsiveness strengths for the three scenarios considered

Responsiveness strength for therapy context s
Aggregation role matrix mcfp (for combination function parameters) of the main simulation.Timing role matrix ms (for speed factors) of the main simulation.