Neural networks with transient state dynamics

We investigate dynamical systems characterized by a time series of distinct semi-stable activity patterns, as they are observed in cortical neural activity patterns. We propose and discuss a general mechanism allowing for an adiabatic continuation between attractor networks and a specific adjoined transient-state network, which is strictly dissipative. Dynamical systems with transient states retain functionality when their working point is autoregulated; avoiding prolonged periods of stasis or drifting into a regime of rapid fluctuations. We show, within a continuous-time neural network model, that a single local updating rule for online learning allows simultaneously (i) for information storage via unsupervised Hebbian-type learning, (ii) for adaptive regulation of the working point and (iii) for the suppression of runaway synaptic growth. Simulation results are presented; the spontaneous breaking of time-reversal symmetry and link symmetry are discussed.


Introduction
Dynamical systems are often classified with respect to their long-time behaviours, which might be, e.g., chaotic or regular [1]. Of special interest are attractors, cycles and limiting cycles, as they determine the fate of all orbits starting within their respective basins of attraction.
Attractor states play a central role in the theory of recurrent neural networks, serving the role of memories with the capability to generalize and to reconstruct a complete memory from partial initial information [2]. Attractor states in recurrent neural networks face however a fundamental functional dichotomy, whenever the network is considered as a functional subunit of an encompassing autonomous information processing system, viz an autonomous cognitive system [3]. The information processing comes essentially to a standstill once the trajectory closes in at one of the attractors. Restarting the system 'by hand' is a viable option for technical applications of neural networks, but not within the context of autonomously operating cognitive systems.
One obvious way out of this dilemma would be to consider only dynamical systems without attractor states, i.e. with a kind of continuously ongoing 'fluctuating dynamics', as illustrated in figure 1, which might possibly be chaotic in the strict sense of dynamical system theory. The problem is then, however, the decision-making process. Without well-defined states, which last for certain minimal periods, the system has no definite information-carrying states onto which it could base the generation of its output signals. It is interesting to note, in this context, that indications for quasi-stationary patterns in cortical neural activity have been observed [4]- [6]. These quasi-stationary states can be analysed using multivariate time-series analysis, indicating self-organized patterns of brain activity [7]. Interestingly, studies of EEG recordings have been interpreted in terms of brain states showing aperiodic evolution states going through sequences of attractors that on access support the experience of remembering [8]. These findings suggest that 'transient-state dynamics', as illustrated in figure 1, might be of importance for cortical firing patterns.
It is possible, from the viewpoint of dynamical system theory, to consider transient states as well-defined periods when the orbit approaches an attractor ruin. With a transient attractor, or attractor ruin, we denote here a point in phase space which could be turned continuously into a stable attractor when tuning certain of the parameters entering the evolution equations of the dynamical system. The dynamics slows down close to the attractor ruin and well-defined transient states emerge within the ensemble of dynamical variables. The notion of transientstate dynamics is related conceptually to chaotic itinerancy (see [9] and references therein), a term used to characterize dynamical systems for which chaotic high-dimensional orbits stay intermittently close to low-dimensional attractor ruins for certain periods. Instability due to dynamic interactions or noise is necessary for the appearance of chaotic itinerancy.
Having argued that transient-state dynamics might be of importance for a wide range of real-world dynamical systems, the question is then how to generate such a kind of dynamical behaviour in a controllable fashion and in a manner applicable to a variety of starting systems, viz we are interested in neural networks which generate transient-state dynamics in terms of a meaningful time series of states approaching arbitrarily close predefined attractor ruins.
The approach we will follow here is to start with an original attractor neural network and to transform then the set of stable attractors into transient attractors by coupling to auxiliary local variables, which we denote 'reservoirs', governed by long timescales. We note that related issues have been investigated in the context of discrete-time, phase coupled oscillators [10], for networks aimed at language processing in terms of 'latching transitions' [11,12], and in the context of 'winnerless competitions' [13]- [15]. Further examples of neural networks capable of generating a time series of subsequent states are neural networks with time-dependent asymmetric synaptic strengths [16] or dynamical thresholds [17]. We also note that the occurrence of spontaneous fluctuating dynamics has been studied [18], especially in relation to the underlying network geometry [19].

DEUTSCHE PHYSIKALISCHE GESELLSCHAFT
An intrinsic task of neural networks is to learn and to adapt to incoming stimuli. This implies, for adaptive neural networks, a continuous modification of their dynamical properties. The learning process could consequently take the network, if no precautions are taken, out of its intended working regime, the regime of transient-state dynamics. Here we will show that it is possible to formulate local learning rules which keep the system in its proper dynamical state by optimizing continuously its own working point. To be concrete, let us denote witht the average duration of quasi-stable transient states and with t the typical time needed for the transition from one quasi-stationary state to the next. The dynamical working point can then be defined as the ratio t/t. These timescales,t and t, result, for the network of cortical neurons, from the properties of the individual neurons, which are essentially time-independent, and from the synaptic strengths, which are slow dynamical variables subject to Hebbian-type learning [20]. It then follows that the modifications of the inter-neural synaptic strengths have a dual functionality: on one side they are involved in memory storage tasks [20], and on the other side they need to retain the working point in the optimal regime. Here we show that this dual functionality can be achieved within a generalized neural network model. We show that working-point optimization is obtained when the Hebbian learning rule is reformulated as an optimization procedure, resulting in a competition among the set of synapses leading to an individual neuron. The resulting learning rule turns out to be closely related to rules found to optimize the memory-storage capacity [21].

Clique encoding
Neural networks with sparse coding, viz with low mean firing rates, have very large memory-storage capacities [22]. Sparse coding results, in extremis, in a 'one-winner-take-all' configuration, for which a single unit encodes exactly one memory. In this limit the storage capacity is, however, reduced again and linearly proportional to the network size, as in the original Hopfield model [23]. Here we opt for the intermediate case of 'clique encoding'. A clique is, in terms of graph theory, a fully interconnected subgraph, as illustrated in figure 2 for a 7-site network. Clique encoding corresponds to a 'several-winners-take-all' set-up. All members of the winning clique mutually excite each other while suppressing the activities of all out-of-clique neurons to zero.
We note that the number of cliques can be very large. For illustration let us consider a random Erdös-Rényi graph with N vertices and linking probability p. The overall number of cliques containing Z vertices is then statistically given by where p Z(Z−1)/2 is the probability of having Z sites of the graph fully interconnected by Z(Z − 1)/2 edges and where the last term is the probability that every single vertice of the N − Z out-of-cliques vertices is not simultaneously connected to all Z sites of the clique. Networks with clique encoding are especially well suited for transient-state dynamics, as we will discuss further below, and are biologically plausible. Extensive sensory preprocessing is known to occur in the respective cortical areas of the brain [20], leading to representations of  features and objects by individual neurons or small cell assemblies. In this framework a site, viz a neural centre, of the effective neural network considered here corresponds to such a small cell assembly and a clique to a stable representation of a memory, by binding together a finite set of features extracted by the preprocessing algorithms from the sensory input stream.

Continuous time dynamics
For our study of possible mechanisms of transient-state dynamics in the context of neural networks we consider i = 1, . . . , N artificial neurons with rate encoding x i (t) and continuous time t ∈ [0, ∞]. Let us comment briefly on the last point. The majority of research in the field of artificial neural networks deals with the case of discrete time t = 0, 1, 2, . . . [20]. We are however interested, as discussed in the introduction section, in networks exhibiting autonomously generated dynamical behaviours, as they typically occur in the context of complete autonomous cognitive systems. We are therefore interested in networks having update rules being compatible with the interaction with other components of a cognitive system. Discrete time updating is not suitable in this context, since the resulting dynamical characteristics (i) depend on the choice of synchronous versus asynchronous updating and (ii) are strongly influenced when effective recurrent loops arise due to the coupling to other components of the autonomous cognitive system. We therefore consider and study here a model with continuous time.

Neural network model
We denote the state variables encoding the activity level by x i (t) and assume them to be continuous variables, x i ∈ [0, 1]. Additionally, we introduce for every site a variable ϕ i (t) ∈ [0, 1], termed 'reservoir', which serves as a fatigue memory facilitating the self-generated time series of 6 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT transient states. We consider the following set of differential equations: We now discuss some properties of (2)-(5), which are suitably modified Lotka-Volterra equations.
1. Normalization. Equations (2)-(4) respect the normalization x i , ϕ i ∈ [0, 1], due to the prefactors (2) and (4), for the respective growth and depletion processes. (r) is the Heaviside-step function: (r < 0) = 0 and (r > 0) = 1. 2. Synaptic strength. The synaptic strength is split into an excitatory contribution ∝ w i,j and an inhibitory contribution ∝ z i,j , with w i,j being the primary variable: the inhibition z i,j is present only when the link is not excitatory (5). We have used z ≡ −1, viz |z| = 1 throughout the paper, which then defines the inverse reference unit for the time development , and recover only once the activity level x i of a given site has dropped below x c , which defines a site to be active when x i > x c . The factor (1 − x i /x c ) occurring in the reservoir growth process, see the rhs of (4), serves for a stabilization of the transition between two subsequent memory states. When the activity level x i of a given centre i drops below x c , it cannot be reactivated immediately; the reservoir cannot fill up again for x i x c , due to the (1 − x i /x c ) in (4). 6. Separation of timescales. A separation of timescales is obtained when the ± ϕ are much smaller than the typical strength of an active excitatory link, i.e. of a typical w ij > 0, leading to transient-state dynamics. Once the reservoir of a winning clique is depleted, it loses, via f z (ϕ), its ability to suppress other sites and the mutual intra-clique excitation is suppressed via f w (ϕ). 2 The reservoir functions have the form of generalized Fermi functions. A possible mathematical implementation for f α (ϕ), with α = w, z, which we used, is  = 0. Right panel: distribution of the synaptic strength for the inhibitory links z ij < −|z| and the active excitatory links 0 < w ij < w leading to clique encoding. Note that w is not a strict upper bound, due to the optimization procedure (11). The shaded area just below zero is related to the inactive w ij , see equations (5) and (11). 7. Absence of stationary solutions. There are no stationary solutions withẋ i = 0 =φ (i = 1, . . . , N) for equations (2) and (4), whenever ± ϕ > 0 do not vanish and for any non-trivial coupling functions f w/z (ϕ) ∈ [0, 1].
When decoupling the activities and the reservoir by setting f w/z (ϕ) ≡ 1 one obtains stable attractors with x i = 1/0 and ϕ i = 0/1 for sites belonging/not-belonging to the winning clique, compare figure 4.
In figure 2 the transient-state dynamics resulting from equations (2)-(5) is illustrated. Presented in figure 2 are data for the autonomous dynamics in the absence of external sensory signals; we will discuss the effect of external stimuli further below. We present in figure 2 only data for a very small network, containing seven sites, which can be easily represented graphically. We have also performed extensive simulations for very large networks, containing several thousands of sites, and found stable transient-state dynamics.

Role of the reservoir
The dynamical system discussed here represents in first place a top-down approach to cognitive systems and a one-to-one correspondence with cortical structures is not intended. The set-up is however inspired by biological analogies and we may identify the sites i of the artificial neural network described by equation (2) not with single neurons, but with neural assemblies or neural centres. The reservoir variables ϕ i (t) could therefore be interpreted as effective fatigue processes taking place in continuously active neural assemblies, the winning coalitions.
It has been proposed [24] that the neural coding used for the binding of heterogeneous sensory information in terms of distinct and recognizable objects might be temporal in nature. Within this temporal coding hypothesis, which has been investigated experimentally [25], neural assemblies fire in phase, viz synchronous, when defining the same object and asynchronous Figure 4. The attractors of the original network, viz when the coupling to the reservoir is turned off by setting f w/z (ϕ) ≡ 1, correspond to x i = 1/0 and ϕ i = 0/1 (i = 1, . . . , N) for members/ non-members of the winning clique. A finite coupling to the local reservoirs ϕ i leads to orbit {x i (t), ϕ i (t)} which are attracted by the attractor ruins for short timescales and repelled for long timescales. This is due to a separation of timescales, as the time evolution of the reservoirs ϕ i (t) occurs on timescales substantially slower than that of the primary dynamical variables x i (t).
when encoding different objects. There is a close relation between objects and memories in general. An intriguing possibility is therefore to identify the memories of the transient-state network investigated in the present approach with the synchronous firing neurons of the temporal coding theory. The winning coalition is characterized by high reservoir levels which would then correspond to the degree of synchronization within the temporal encoding paradigm and the reservoir depletion time ∼ 1/ − ϕ would correspond to the decoherence time of the object binding neurons.
We note that this analogy can however not be carried too far, since synchronization is at its basis a cooperative effect, the reservoir levels describing on the other side single-unit properties. In terms of a popular physics phrase one might speak of a 'poor man's' approach to synchronization, via coupling to a fatigue variable.

Dissipative dynamics
The reason for the observed numerical and dynamical robustness can be traced back to its relaxational nature. For short timescales we can consider the reservoir variables {ϕ i } to be approximatively constant and the system relaxes into the next clique attractor ruin. Once close to a transient attractor, the {x i } are essentially constant, viz close to one/zero and the reservoir slowly depletes. The dynamics is robust against noise, as fluctuations affect only details of both relaxational processes, but not their overall behaviour.
To be precise we note that the phase space contracts with respect to the reservoir variables, namely where we have used (4). We note that the diagonal contributions to the link matrices vanish, z ii = 0 = w ii , and therefore ∂r i /∂x i = 0. The phase space contracts consequently also with 9 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT respect to the activities, where we have used (2). The system is therefore strictly dissipative, in the absence of learning and external perturbations, leading to the observed numerically robust behaviour.

Strict transient-state dynamics
The self-generated transient-state dynamics shown in figure 2 exhibits well-characterized plateaus in the x i (t), since small values have been used for the depletion and the growth rate of the reservoir, − ϕ = 0.005 and + ϕ = 0.015. The simulations presented in figure 2 were performed using w ij = 0.12 for all nonzero excitatory interconnections.
We define a dynamical system to have 'strict transient-state dynamics' if there exists a set of control parameters allowing to turn the transient states adiabatically into stable attractors. Equations (2)-(5) fulfil this requirement, for − ϕ → 0 the average durationt of the steady-state plateaus observed in figure 2 diverges.
Alternatively, by selecting appropriate values for − ϕ and + ϕ , it is possible to regulate the 'speed'of the transient-state dynamics, an important consideration for applications. For a working cognitive system, such as the brain, it is enough that the transient states are stable just for a certain minimal period needed to identify the state and to act upon it. Anything longer would just be a 'waste of time'.

Universality
We note that the mechanism for the generation of stable transient-state dynamics proposed here is universal in the sense that it can be applied to a wide range of dynamical systems in a frozen state, i.e. which are determined by attractors and cycles.
Physically, the mechanism we propose here is to embed the phase space {x i } of an attractor network into a larger space, {x i , ϕ j }, by coupling to additional local slow variables ϕ i . Stable attractors are transformed into attractor ruins since the new variables allow the system to escape the basin of the original attractor {x i = 1/0, ϕ j = 0/1} (for in-clique/out-of-clique sites) via local escape processes which deplete the respective reservoir levels ϕ i (t). Note that the embedding is carried out via the reservoir functions f z/w (ϕ) in equation (3) and that the reservoir variables keep a slaved dynamics (4) even when the coupling is turned off by setting f z/w (ϕ) → 1 in equation (3).
This mechanism is illustrated in figure 4. Locality is an important ingredient for this mechanism to work. The trajectories would otherwise not come close to any of the attractor ruins again, viz to the original attractors, being repelled by all of them with similar strengths and fluctuating dynamics of the kind illustrated in figure 1 would result.

Cycles and time-reversal symmetry
The systems illustrated in figures 2 and 5 are very small and the transient-state dynamics soon settles into a cycle of attractor ruins, since there are no incoming sensory signals considered in the respective simulations. For networks containing a larger number of sites, the number of attractors can be however very large and such the resulting cycle length. We performed simulations for a 100-site network, containing 713 clique-encoded memories. We found no cyclic behaviour even for sequences of transient states containing up to 4400 transient states. We note that the system does not necessarily retrace its own trajectory once a given clique is stabilized for a second time, an event which needs to occur in any finite system, the reason being that the distribution of reservoir levels is in general different when a given clique is revisited for a second time. We note that time-reversal symmetry is 'spontaneously' broken in the sense that repetitive transient-state dynamics of type does generally not arise. The reason is simple. Once the first clique is deactivated its respective reservoir levels need a certain time to fill up again, compare figure 2. Time-reversal symmetry would be recovered however in the limit + ϕ − ϕ , i.e. when the reservoirs would be refilled much faster than depleted.

Reproducible sequence generation
Animals need to generate sequences of neural activities for a wide range of purposes, e.g. for movements or for periodic internal muscle contractions, the heartbeat being a prime case. These sequences need to be successions of well-defined firing patterns, usable to control actuators, viz the muscles. The question then arises under which condition a dynamical system generates reproducible sequences of well-defined activity patterns, i.e. controlled time series of transient states [26,27].

DEUTSCHE PHYSIKALISCHE GESELLSCHAFT
There are two points worth noting in this context. 1. The dynamics described by equations (2)-(4) works fine for randomly selected link matrices w ij which may, or may not change with time passing. In particular one can select the cliques specifically in order to induce the generation of a specific succession of transient states, an example is presented in figure 5. The network is capable, as a matter of principle, to generate robustly large numbers of different sequences of transient states. For geometric arrangements of the network sites, and of the links w ij , one finds waves of transient states sweeping through the system. 2. In section 3 we will discuss how appropriate w ij can be learned from training patterns presented to the network by an external teacher. We will concentrate in section 3 on the training and learning of individual memories, viz of cliques, but suitable sequences of training patterns could be used also for learning temporal sequences of memories.

Autonomous online learning
An external stimulus, {b (ext) i (t)}, influences the activities x i (t) of the respective neural centres. This corresponds to a change of the respective growth rates r i , compare equation (3), where f z (ϕ i ) is an appropriate coupling function, depending on the local reservoir level ϕ i . When the effect of the external stimulus is strong, namely when f z b (ext) i is strong, it will in general lead to an activation x i → 1 of the respective neural centre i. A continuously active stimulus does not convey new information and should, on the other hand, lead to habituation, having a reduced influence on the system. A strong, continuously present stimulus leads to a prolonged high activity level x i → 1 of the involved neural centres, leading via (4) to a depletion of the respective reservoir levels, on a timescale given by the inverse reservoir depletion rate, 1/ − ϕ . Habituation is then mediated by the coupling function f z (ϕ i ) in (6), since f z (ϕ i ) becomes very small for ϕ i → 0, compare figure 3. The effect of habituation incorporated in (6) therefore allows the system to turn its 'attention' to other competing stimuli, with novel stimuli having a higher chance to affect the ongoing transient-state dynamics.
We now provide a set of learning rules allowing the system to acquire new patterns on the fly, viz during its normal phase of dynamical activity. The alternative, modelling networks having distinct periods of learning and of performance, is of widespread use for technical applications of neural networks, but it is not of interest in our context of continuously active cognitive systems.

Short-and long-term synaptic plasticities
There are two fundamental considerations for the choice of synaptic plasticities adequate for neural networks with transient-state dynamics.
1. Learning is a very slow process without a short-term memory. Training patterns need to be presented to the network over and over again until substantial synaptic changes are induced [20]. A short-term memory can speed up the learning process substantially, as 12 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT it stabilizes external patterns and hence gives the system time to consolidate long-term synaptic plasticity. 2. Systems using sparse coding are based on a strong inhibitory background, the average inhibitory link-strength |z| is substantially larger than the average excitatory link-strengthw, |z| w.
It is then clear that gradual learning is effective only when it affects dominantly the excitatory links: small changes of large parameters do not lead to new transient attractors, nor do they influence the cognitive dynamics substantially.
It then follows that it is convenient to split the synaptic plasticities into two parts, where the w S/L ij correspond to the short-term and to the long-term synaptic plasticities respectively.

Negative baseline
Equation (5), z ij = −|z| (−w ij ), implies that the inhibitory link-strength is either zero or −|z|, but is not changed directly during learning, in accordance with the above discussion. We may therefore consider two kinds of 'excitatory links strengths'. When w i,j acquires, during learning, a positive value, the corresponding inhibitory link z ij is turned off via equation (5) and the excitatory link w i,j determines the value of the respective term, f w (ϕ i ) (w ij )w i,j + z i,j f z (ϕ j ), in equation (3). We have used a small negative baseline of W (min) L = −0.01 throughout the simulations.

Short-term memory dynamics
It is reasonable to have a maximal possible value W (max) S for the short-term synaptic plasticities. We consider therefore the following Hebbian-type learning rule: w S ij (t) increases rapidly, with rate + S , when both the pre-and the post-synaptic neural centres are active, viz when their respective activities are above x c . Otherwise it decays to zero, with a rate − S . The coupling functions f z (ϕ) preempt prolonged self-activation of the short-term memory. When the pre-and the post-synaptic centres are active long enough to deplete their respective reservoir levels, the short-term memory is shut off via the f z (ϕ). We have used + In figure 6 we present the time evolution of some selected w S ij (t), for a simulation using the network illustrated in figure 2. The short-term memory is activated in three cases.
1. When an existing clique, viz a clique encoded in the long-term memory w L ij , is activated, as it is the case of (0, 1) for the data presented in figure 6, the respective intra-clique w S ij are also activated. This behaviour is a side effect since, for the parameter values chosen here, the magnitudes of the short-term link-strengths are substantially smaller than those of the long-term link-strengths. 2. During the transient-state dynamics there is a certain overlapping of a currently active clique with the subsequent active clique. For this short time span the short-term plasticities w S ij for synapses linking these two cliques get activated. An example is the link (2, 4) for the simulation presented in figure 6. 3. When external stimuli act on two sites not connected by an excitatory long-term memory link w L ij , the short-term plasticity w S ij makes a qualitative difference. It transiently stabilizes the corresponding link and the respective link becomes a new clique (i, j) either by itself, or as part of an enlarged and already existing clique. An example is the link (3,6) for the simulation presented in figure 6. Note however that, without subsequent transferral into the long-term memory, these new states would disappear with a rate − S once the causing external stimulus was gone.
The last point is the one of central importance, as it allows for temporal stabilization of new patterns present in the sensory input stream.

Long-term memory dynamics
Information processing dynamical systems retain their functionalities only when they keep their dynamical properties within certain regimes; they need to regulate their own working point. For the type of systems discussed here, exhibiting transient-state dynamics, the working point is, as discussed in the Introduction section, defined as the time t the system needs for a transition 14 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT from one quasi-stationary state to the subsequent one, relative to the lengtht of the individual quasi-stationary states, which is given by 1/ − ϕ . The cognitive information processing within neural networks occurs on short to intermediate timescales. For these processes to work well the mean overall synaptic plasticities, viz the average strength of the long-term memory links w L ij , need to be regulated homeostatically. The average magnitude of the growth rates r i , see equation (3), determines the time t needed to complete a transition from one winning clique to the next transient-state. It therefore constitutes a central quantity regulating the working point of the system, sincet ∼ 1/ − ϕ is fixed, the reservoir depletion rate − ϕ is not affected by learning processes which affect exclusively the inter-neural synaptic strengths.
The bare growth rates r i (t) are quite strongly time-dependent, due to the time-dependence of the post-synaptic reservoirs entering the reservoir function f w (ϕ i ), see equation (3). The effective incoming synaptic signal strength which is independent of the post-synaptic reservoir ϕ i , is a more convenient local control parameter. The working point of the cognitive system is optimal when the effective incoming signal is, on the average, of comparable magnitude r (opt) for all sites, The long-term memory has two tasks: to extract and encode patterns present in the external stimulus, equation (6), via unsupervised learning and to keep the working point of the dynamical system in its desired range. Both tasks can be achieved by a single local learning rule, where r i = r (opt) −r i . For the numerical simulations we used (opt) L = 0.0008, W (min) L = −0.01 and r (opt) = 0.2. We now comment on some properties of these evolution equations for w L ij (t). 1. Hebbian learning. The learning rule (11) is local and of Hebbian type. Learning occurs only when the pre-and the post-synaptic neuron are active, viz when their respective activity levels are above the threshold x c . Weak forgetting, i.e. the decay of seldom used links, is governed by (12). The function d(w L ij ) determines the functional dependence of forgetting on the actual synaptic strength; we have used d(w L ij ) = θ(w L ij )w L ij for simplicity. 2. Synaptic competition. When the effective incoming signalr i is weak/strong, relative to the optimal value r (opt) , the active links are reinforced/weakened, with W (min) L being the minimal value for the w ij . The baseline W (min) L is slightly negative, compare figures 3 and 7. The Hebbian-type learning then takes place in the form of a temporal competition among incoming synapses-frequently active incoming links will gain strength, on average, at the expense of rarely used links.  (3) and (6) acts for t ∈ [400, 410] with strength b (ext) = 3.6. The stimulus pattern (3,6) has been learned by the system, as the w 3,6 and w 6,3 turned positive during the learning interval ≈ [400, 460]. The learning interval is substantially longer than the bare stimulus length due to activation of the short-term memory. The decay of certain w L ij in the absence of an external stimulus is due to forgetting (11), which should normally be a very weak effect, but which has been chosen here to be a sizeable − L = 0.1, for illustrational purposes.
3. Fast learning of new patterns. In figure 7 the time evolution of some selected w L ij is presented. A simple input pattern is learned by the network. In this simulation the learning parameter (opt) L has been set to a quite large value such that learning occurs in one step (fast learning). 4. Suppression of runaway synaptic growth. When a neural network is exposed repeatedly to the same, or to similar external stimuli, unsupervised learning generally leads then to uncontrolled growth of the involved synaptic strengths. This phenomena, termed 'runaway synaptic growth', can also occur in networks with continuous self-generated activities, when similar activity patterns are auto-generated over and over again. Both kinds of synaptic runaway growth are suppressed by the proposed link-dynamics (11). 5. Negative baseline. Note that w ij = w S ij + w L ij enters the evolution equation (3) as θ(w ij ). We can therefore distinguish between active (w ij > 0) and inactive (w ij < 0) configuration, compare figure 3. The negative baseline W (min) L < 0 entering (11) then allows for the removal of positive links and provides a barrier against small random fluctuations, compare section 3.2.
During a transient state we have x i → 1 for all vertices belonging to the winning coalition and x j → 0 for all out-of-clique sites, leading tõ j ∈ active sites, compare equation (9). The working-point optimization rule (10),r i → r (opt) is therefore equivalent to a local normalization condition enforcing the sum of active incoming link-strengths to be constant, i.e. site-independent. This rule is closely related to a mechanism of self-regulation of the average firing rate of cortical neurons proposed by Bienenstock et al [28].

Online learning
The neural network we consider here is continuously active, independent of whether there is sensory input via equation (6) or not, the reason being that the evolution equations (2) and (4) generate a never-ending time series of transient states. It is a central assumption of the present study that continuous and self-generated neural activity is a condition sine qua non for modelling overall brain activity or for developing autonomous cognitive systems [3]. The evolution equations for the synaptic plasticities, namely (8) for the short-term memory and (11) for the long-term memory, are part of the dynamical system, viz they determine the time evolution of w S ij (t) and of w L ij (t) at all times, irrespectively of whether external stimuli are presented to the network via (6) or not. The evolution equations for the synaptic plasticities need therefore to fulfil, quite in general for a continuously active neural network, two conditions.
(a) Under training conditions, namely when input patterns are presented to the system via (6), the system should be able to modify the synaptic link-strength accordingly, such that the training patterns are stored in the form of new memories, viz cliques representing attractor ruins and leading to quasi-stationary states. (b) In the absence of input the ongoing transient-state dynamics will lead constantly to synaptic modifications, via (8) and (11). These modifications may not induce qualitative changes, such as autonomous destruction of existing memories or the spontaneous generation of spurious new memories. New memories should be acquired exclusively via training by external stimuli.
In the following section we will present simulations in order to investigate these points. We find that the evolution equations formulated in this study conform with both conditions (a) and (b) above, due to the optimization principle for the long-term synaptic plasticities in equation (11).

Simulations
We have performed extensive simulations of the dynamics of the network with ongoing learning, for systems with up to several thousands of sites. We found that the dynamics remains long-term stable even in the presence of continuous online learning governed by equations (8) and (11), exhibiting semi-regular sequences of winning coalition, as shown in figure 2. The working point is regulated adaptively and no prolonged periods of stasis or trapped states were observed in the simulations; neither did periods of rapid or uncontrolled oscillations occur.
Any system with a finite number of sites N and a finite number of cliques settles in the end, in the absence of external signals, into a cyclic series of transient states. Preliminary investigations of systems with N ≈ 20-100 resulted in cycles spanning on the average a finite fraction of the set of all cliques encoded by the network. This is a notable result, since the overall number of cliques stored in the network can easily be orders of magnitude larger than the number of sites N itself, compare equation (1). Detailed studies of the cyclic behaviour for autonomous networks will be presented elsewhere. Learning results for systems with N sites and N links excitatory links and N 2 , . . . , N 6 cliques containing 2, . . . , 6 sites. N tot is the total number of memories to be learned. N l and N par denote the number of memories learned completely/partially. par   20  104  1  10  42  11  1  65  60  3  100  901  26  563  122  2  0  713  704  7

Learning of new memories
Training patterns {p 1 , . . . , p Z } presented to the system externally via . . , p Z }, are learned by the network via activation of the short-term memory for the corresponding intra-pattern links. In figures 6 and 7 we present a case study. The ongoing internal transient-state dynamics is interrupted at time t = 400 by an external signal which activates the short-term memory, see figure 6. Note that the short-term memory is activated both by external stimuli and internally whenever a given link becomes active, i.e. when both pre-and post-synaptic sites are active co-instantaneous. The internal activation does however not lead to the internal generation of spurious memories, since internally activated links belong anyhow to one or more already existing cliques.
In figure 7 we present the time development of the respective long-term synaptic modifications, w L ij (t). The parameters for learning chosen here allow for fast learning; the pattern corresponding to the external signal, retained temporarily in the short-term memory, is memorized in one step, viz the corresponding w L 36 (t) becomes positive before the transition to the next clique takes place. For practical applications smaller learning rates might be more suitable, as they allow the learning of spurious signals generated by environmental noise to be avoided.
In table 1 we present the results for the learning of two networks with N = 20 and N = 100 from scratch. The initial networks contained only two connected cliques, in order to allow for a nontrivial initial transient state dynamics; all other links were inhibitory. Learning by training and storage of the externally presented patterns, using the same parameters as for figures 6 and 7, is nearly perfect. The learning rate can be chosen over a very wide range, as we tested. Here the training phase was completed, for the 100-site network, by t = 5 × 10 4 . Coming back to the discussion in section 3.5, we then conclude that the network fulfils the there formulated condition (a), being able to store efficiently training patterns as attractor ruins in the form of cliques.

Link-asymmetry
We note that the Hebbian learning via the working-point optimization, equation (11), leads to the spontaneous generation of asymmetries in the link matrices, viz to w L ij = w L ji , since the synaptic plasticity depends on the post-synaptic growth rates.
In figure 8 we present, for two simulations, the distribution of the link-asymmetry w L ij − w L ji for all positive w L ij , for the 100-site network of table 1, at time t = 5 × 10 5 . The distributions shown in figure 8 are particular realizations of steady-state distributions, viz they did not change appreciably for wide ranges of total simulation times. (i) In the first simulation the network had been learned from scratch. The set of 713 training patterns were presented to the network for t ∈ [0, 5 × 10 4 ]. After that, for t ∈ [5 × 10 4 , 5 × 10 5 ] the system evolved freely. A total of 3958 transient states had been generated at time t = 5 × 10 5 , but the system had nevertheless not yet settled into a cycle of transient states, due to the ongoing synaptic optimization, equations (8) and (11). There were 661 cliques remaining at t = 5 × 10 5 , as the link competition had led to the suppression of some seldom used links. (ii) In the second simulation, uniform and symmetric starting excitatory links w L ij → 0.12 had been set by hand at t = 0, for all intra-clique links. The same N = 100 network as in (i) was used and the simulation ran in the absence of external stimuli. All 713 cliques were still present at t = 5 × 10 5 , despite the substantial reorganization of the link-strength distribution, from the initial uniform to the stationary distribution shown in figure 8. A total of 4123 transient states have been generated in the course of the simulation, without the system entering into a cycle.
For both simulations all evolution equations, namely (2) and (4) for the activities and reservoir levels, as well as (8) for the short-term memory and (11) and (12) for the long-term memory determined the dynamics for all times t ∈ [0, 5 × 10 5 ]. The difference between (i) and (ii) being the way the memories are determined, via training by external stimuli, equation (6), as in (i) or by hand as in (ii).
Comparing the two link distributions shown in figure 8, we note the overall similarity, a consequence of the continuously acting working-point optimization. The main differences turn up for small link-strengths, since these two simulations started from opposite extremes (vanishing/strong initial excitatory links). The details of the link distribution shown in figure 8 depend sensitively on the parameters. For the results shown in figure 8 we used for illustrational purposes − L = 0.1, which is a very big value for a parameter regulating weak forgetting. We also performed simulations with − L = 0, the other extreme, and found that the link-asymmetry distribution was somewhat more scattered.
Coming back to the discussion in section 3.5, we then conclude that the network fulfils the there formulated condition (b), since essentially no memories acquired during the training state were destroyed, or spurious new memories spontaneously created, during the subsequent free evolution.

Conclusions
We have investigated a series of issues regarding neural networks with autonomously generated transient-state dynamics. We have presented a general method allowing us to transform an initial attractor network into a network capable of generating an infinite time series of transient states. The resulting dynamical system has strictly contracting phase space, with a one-to-one adiabatic correspondence between the transient states and the attractors of the original network.
We then have discussed the problem of homeostasis, namely the need for the system to regulate its own working point adaptively. We formulated a simple learning rule for unsupervised local Hebbian-type learning, which solves the homeostasis problem. We note here that this rule, equation (11), is similar to learning rules shown to optimize the overall storage capacity for discrete-time neural networks [22].
We have studied a continuous time neural network model using clique encoding and showed that this model is very suitable for studying transient-state dynamics in conjunction with ongoing learning-on-the-fly for a wide range of learning conditions. Both fast and slow online learning of new memories is compatible with the transient-state dynamics self-generated by the network.
Finally we turn to the interpretation of the transient-state dynamics. Examination of a typical time series of subsequently activated cliques, such as the one shown in figure 2, reveals that the sequence of cliques is not random. Every single clique is connected to its predecessor via excitatory links; they are said to be 'associatively' connected [29]. The sequence of subsequently active cliques can therefore be viewed, cum grano salis, as an 'associative thought process' [29]. The possible use of such processes for cognitive information processing, however, still needs to be investigated.