One-to-one Mapping between Stimulus and Neural State: Memory and Classification

Synaptic strength can be seen as probability to propagate impulse, and function could exist from propagation activity to synaptic strength. If the function satisfies constraints such as continuity and monotonicity, neural network under external stimulus will always go to fixed point, and there could be one-to-one mapping between external stimulus and synaptic strength at fixed point. In other words, neural network"memorizes"external stimulus in its synapses. A biological classifier is proposed to utilize this mapping.


I. INTRODUCTION
It is still unanswered how memory works. Specifically, exposed to external stimulus, how can neural network memorize the information of stimulus for a long time if not permanently? Exposed to different stimulus, how can neural network attain different memory? Ideally, memory should be a one-to-one mapping between external stimulus and long-lasting state of neural network. Known experiment results show that "neurons that fire together wire together" 1,2 : synaptic connection strengthens or weakens over time in response to increases or decreases in impulse propagation 3 . This biochemical mechanism, called synaptic plasticity 4,5 , establishes a relation between synaptic strength and impulse propagation frequency. We find out that, with some "lightweight" constraints on this relation, neural network under constant stimulus will always evolve to a state and stay there, and there can be one-to-one mapping between stimulus and the fixated state.
The remainder of paper goes as follows. Section II identifies the constraints, under which synaptic plasticity of one synaptic connection leads to fixed state and one-to-one stimulus-state mapping (memory). Section III extends the concepts of fixed state and one-to-one mapping for neural network consisting of many synaptic connections. Section IV proposes a biological classifier utilizing this memory.

II. SYNAPTIC CONNECTION AND ITS FIXED POINT
FIG. 1. Synaptic connection with strength s is directed from neuron 1 to neuron 2. Neuron 1 receives stimulus with probability x from environment or upstream neurons. Synaptic connection propagates nerve impulse (action potential) to neuron 2. As a result, neuron 2 receives impulse with probability y; that is, neuron 1 and 2 "fire together" with probability y.
Let us start with one synaptic connection as shown in FIG 1. In nature, synapses are known to be plastic, low-precision a) Electronic mail: lsz@fuyunresearch.org and unreliable 6 . This stochasticity allows us to assume synaptic strength s to be the probability (reliability) of propagating a nerve impulse through [7][8][9] , instead of being weight scalar as in artificial neural network 10 (ANN). Easily we have y=xs where x, s, y∈[0, 1]. Now we treat synaptic plasticity, i.e. the relation between synaptic strength s and impulse propagation probability y, as a function s * = λ (y). (1) Here s * ∈[0, 1] represents the target strength a connection will be strengthened or weakened to over time if the connection is under constant propagation probability y, while s in y=xs represents current strength. By y=xs and Eq. (1), we have s * =λ (xs) stating that, under constant stimulus probability x, the connection initialized with strength s will evolve towards s * . Our following reasoning hinges on this target strength function λ , and we will put constrains on this uncharted function to see how they affect the dynamics of connection strength, and most importantly how stimulus is oneto-one mapped to strength at fixated state.
Here is our first constraint: λ is continuous on y. This constraint is neurobiologically justifiable regarding synaptic plasticity, since sufficiently small change in impulse probability would most probably result in arbitrarily small change in synaptic strength. In that case, given any x, λ (xs) is a continuous function on s from unit interval [0, 1] to unit interval [0, 1], and according to Brouwer's fixed-point theorem 11 (which states that, for any continuous function f mapping a compact convex set to itself, there is a point t such that f (t)=t.) there must exist a fixed point s + ∈[0, 1] such that s + =λ (xs + ): connection strength at s + will evolve to s + and thus fixate, no longer strengthened or weakened. Moreover, as illustrated in FIG 2, given any initial value the strength always goes to fixed point. Therefore, a gentle constraint of continuity on λ function can preferably drives synaptic connection to fixed state.
To verify connection strength's tendency towards fixed points, we design Algorithm 1 to simulate our connection model. In this simulation 12 , recent successful propagation is recorded and the success rate is supposed to approximate propagation probability y; connection strength changes by a small step ∆ s each iteration to the direction of target strength. As shown in FIG 3, we run the simulation for four typical target strength functions, and the strength trajectories resulted FIG. 2. Two examples of λ (xs) are depicted as red bold lines, and fixed points as blue dots. (a) Given any initial s 1 <λ (xs 1 ), there must exist a fixed point s + ∈(s 1 , 1]; strength s tends to increase from s 1 as long as target strength λ (xs)>s. Given any initial s 2 >λ (xs 2 ), there must exist a fixed point s + ∈[0, s 2 ); strength s tends to decrease from s 2 as long as target strength λ (xs)<s. Controlled by these two tendencies, s will reach and stay at fixed point s + such that s + =λ (xs + ).  show that constraint of continuity ensures the tendency towards fixed points given any initial strength. pick random r1 and r2 from uniform distribution Uni f (0, 1). 8: if x>r1 and s>r2 then 9: stimulus received, impulse propagated: recorder[p]←1. if recorder has been traversed once (i≥10 4 ) then 12:

Algorithm 1 connection strength's tendency to fixed points
set y with the proportion of 1-entries in recorder. 13: set target strength: s * ←λ (y). 14: if s * >s then 15: step-increase current strength: s←min(s+∆ s , 1).   .05 and λ (y)=−y+1 can lead to oneto-one stimulus-strength mapping. Given any stimulus x, synaptic connection equipped with one of these functions will have one single fixed point of strength regardless of its initial strength, such that the relation between stimulus and fixed point strength can be treated as a function s + =θ (x). In FIG 4, simulation shows that θ (x) could be strictly monotonic and hence one-to-one mapping between x and s + , such that θ (x) has one-to-one inverse function θ −1 (s + ). By contrast, FIG 5 shows that λ (y)=0.5sin(4πy)+0.5 cannot ensure the uniqueness of fixed point and thus there is no such one-to-one θ (x); FIG 6 shows that there is no θ either for the discontinuous λ function in FIG 3(d).
In fact, we can, by putting more constraints on λ , pinpoint function θ as well as the conditions for it to be oneto-one mapping. In addition to constraint of continuity, let λ(y) be strictly monotonic on [0, 1] and hence one-to-one; let λ(0) =0 to rule out fixed point s + =0. In that case, λ has inverse function λ −1 (s) which is strictly monotonic be- tween λ (0) and λ (1), and given any fixed point strength s + between we can identify stimulus x=λ −1 (s + )/s + . That is, function θ −1 (s + ) = λ −1 (s + )/s + exists. Let λ −1 (s)/s be strictly monotonic between λ(0) and λ(1). Then given any stimulus x∈[0, 1] there is one single fixed point s + such that x=λ −1 (s + )/s + . That is, function s + =θ (x) exists. Both of λ (y)=0.9y+0.05 and λ (y)=−y+1 obey all those constraints and their one-to-one θ functions can be verified by simulation results in FIG 4, whereas λ (y)=0.5sin(4πy)+0.5 is not even strictly monotonic. However, neither λ (y)=0.9y+0.05 nor λ (y)=−y+1 is ideal for our purpose. Guided by these constraints, we choose λ function carefully such that its derived θ (x) function is monotonically increasing and range of which spans nearly the entire [0, 1] interval, as shown in FIG 7. Of all the λ constraints, continuity and strong monotonicity are reasonable requirements of consistency on the neurobiological process of synaptic plasticity, whereas λ (0) =0 and strong monotonicity of λ −1 (s)/s are rather specific and peculiar claims. Admittedly, those λ constraints need to be supported by neurobiological evidences. Simulation results show that, λ L leads to linear-like θ L in blue such that θ L (x)≈x, and λ T leads to threshold-like θ T in green. Now we have one-to-one (continuous and strictly monotonic) function λ , λ −1 , θ and θ −1 . Given s + we can identify x and y without ambiguity, and vice versa. Our interpretation of these mappings is, synaptic connection at fixed point precisely "memorizes" the information of what (stimulus) it senses and how it responses (with impulse propagation).

III. NEURAL NETWORK AND ITS FIXED POINT
Now let us turn to neural network shown in FIG 8. Neural network could be treated as an "aggregate connection" as it turns out. We shall see that, definitions and reasoning for neural network align well with neural connection in last section.
As with synaptic connection, we can describe neural network by defining (1) external stimulus as an n-dimensional vector X∈[0, 1] n in which each x i is the probability of neuron i receiving stimulus; (2) connections' strength as a cdimensional vector S∈[0, 1] c in which each s i j is the strength of connection from neuron i to neuron j (denoted as i j); Neural network (of one or multiple agents) consists of n≥2 neurons and c≥1 directed synaptic connections. An example of n=8 and c=7 is depicted. Each neuron receives stimulus from environment with probability and propagates out nerve impulses throughout synaptic connections, e.g., triggered by stimulus x 1 neuron 1 propagates impulses stochastically down along the directed paths 1 7 8 2 and 1 7 8 5 3. Cyclic path (e.g. 3 8 5 3) is allowed but loop (e.g. 3 3) isn't. Each neuron could have outbound or inbound connections, or neither, or both. which each y i j is the impulse propagation probability through i j. In fact, one single neural connection is a special case of neural network with c=1 and n=2.
Stimulus and strength determines impulse propagation within neural network, so there exists a mapping Ψ:(X, S)→Y . Presumably, the mapping Ψ is continuous on S. By Eq. In this simulation, impulses traverse neural network stochastically such that each neuron is fired at most once per iteration; synaptic connections update their strength as in Algorithm 1.
As with synaptic connection, our goal is to establish oneto-one mapping between stimulus X and fixed point S + for neural network. Generally the number of stable fixed points for a neural network is ∏ c f i j where f i j is the number of stable fixed points for i j. As in FIG 9(b), ∏ c f i j can be enormous when f i j ≥ 2. Therefore, continuity of λ function makes neural network go to fixed point -not necessarily the unique one yet. With all λ constraints, we have: (1) Λ is one-to-one mapping and thus has inverse mapping Λ −1 :S * →Y ; (2) there exists a mapping Θ:X→S + , because under stimulus X neural network will go to the unique fixed point S + no matter what initial strength S 0 it starts with; (3) if Θ is a one-to-one mapping, Θ has inverse mapping Θ −1 :S + →X. With mapping Λ, Λ −1 , Θ and Θ −1 being one-to-one, given S + we can identify X and Y without ambiguity, and vice versa. Therefore, the same interpretation with respect to synaptic connection could apply: neural network precisely "memorizes" the information about stimulus on many neurons and propagation across many connections.
Nevertheless, even all of λ constraints are not sufficient to secure one-to-one Θ:X→S + for neural network, as opposed to single neural connection. Here is a case. For Θ to be oneto-one, all neurons must have outbound connection. Otherwise, e.g., for a neural network with three neurons (say 0, 1 and 2) and two connections (say 0 1 and 1 2), stimulus X 1 =(1, 1, 0) and X 2 =(1, 1, 1) will lead to the same fixed point because stimulus on neuron 3, no matter what it is, affects no connection. Or equivalently, for Θ to be one-to-one, the definition of X should consider only the neurons with outbound connections such that X's dimension dim(X)≤n. In the perspective of information theory 13 , many-to-one Θ introduces equivocation to neural network at fixed point, as if information loss occurred due to noisy channel. If dim(X)>dim(S)=c, mapping Θ conducts "dimension reduction" on stimulus X, and information loss is bound to occur. Here is an another case. Consider a neural network with 0 2, 1 2 and 2 3, and stimulus X=(x 0 , x 1 ). When the neural network is at fixed point, x 2 =x 0 s + 02 +x 1 s + 12 −s + 02 s + 12 x 0 Pr(1|0) where Pr(1|0) is the probability of neuron 1 being stimulated conditional on neuron 0 being stimulated. Then Pr(1|0) affects s + 23 and hence S + , or in other words neural network at fixed point gains the hidden information of Pr(1|0). However, if Pr(1|0) varies, given mere X there will be uncertainty about S + such that mapping Θ doesn't exist unless stimulus X is "augmented" to X=(x 0 , x 1 , Pr(1|0)).

IV. AN APPLICATION FOR CLASSIFICATION
Memory can manifest only as impulses propagation. Ideally, a neural network with memory of stimulus X -formally, mapping Θ casts memory of stimulus X as fixed point S +should response to stimulus X more "intensely" than the neural network with different memory responses to X. It is natural to differentiate response by counting the neurons fired or synaptic connections propagated by impulses. In this section, we adopt the count of synaptic connections propagated as a macroscopic measure of how intensely memory responses to stimulus or stimulus "recalls" memory. And we will propose a classifier consisting of g neural networks, which classifies stimulus into one of g classes by the decision criteria of which neural network gets the most synaptic connections propagated. Reminiscent of supervised learning 14 , each neural network of our classifier is trained to its fixed point by its particular training stimulus, and then a testing stimulus is tested on all g neural networks independently to see which gets the most connections propagated. For simplicity we assume testing itself doesn't jeopardize fixed points of neural networks. And most importantly we assume that for each neural network given any stimulus there is one single fixed point such that mapping Θ:X→S + exists.
Consider a neural network in the classifier to be trained by X to fixed point S + and then tested by X. In other words, neural network memorizingX as S + is tested by X. Because impulses propagate across neural network stochastically, the count of synaptic connections propagated in one test should be random variable. Let it be ZX X . Then for the neural network . By central limit theorem, Z's distribution could tend towards Gaussian-like (bell curve) as c increases, even if all z i j are not independent and identically distributed. We have And when c is large, For any i j, in training stage because S + =Θ(X) we have s + i j =θ i j (X), and in testing stage x i is uniquely determined by S + and X such that x i is a function ofX and X.
We experiment with this classifier to classify handwritten digit images 15 . Ten identical neural networks (hence g=10) of FIG 10, each designated for a digit from 0 to 9, are trained to their fixed points by their training images in FIG 11 as stimulus, and then testing images, also as stimulus, are classified into the digit whose designated neural network gets the biggest Z value. We run many tests to evaluate classification accuracy, and collect Z values to approximate r.v. Z's distribution. With all synaptic connections equipped with λ L in FIG 7, the classifier has accuracy ∼44%, and ∼51% with λ T . Note that, equipped with λ L or λ T , neural network of FIG 10 will have one-to-one Θ L or Θ T according to last section. FIG  12 and FIG 13 show that, in positive testing (e.g. digit-6 image is tested in neural network trained by digit-6 images), Z's expected value (sample mean) could be considerably bigger than that in negative testing (e.g. digit-6 image is tested in neural network trained by digit-1 images), so as to discriminate digit-6 images from the others. Given the same testing image classification target can be different test by test since ten Z outcomes are randomized. To improve classification accuracy, we shall distance the distribution of positive testing Z from those of negative testing Z as far as possible. We present another two special neural networks in FIG 14 to show how our classifier utilizes memory to classify images and how to improve its accuracy.  11. A digit image has 8×8=64 pixels, and pixel grayscale is normalized to value between 0 and 1 (by dividing 16) as stimulus probability. The upper row shows examples of digit images, and the lower row shows better written "average images", each of which is actually pixel-wise average of a set of images of a digit. Each neural network is trained in each iteration by the same "average image", or equivalently in each iteration by image randomly drawn from the set of images.
When the classifier adopts ten neural networks of FIG 14(a) and equips all connections with λ L in FIG 7, classification accuracy is ∼31% and Z's distribution for testing digit-6 images is shown in FIG 15(a). We already know that λ L makes θ L (x)≈x. Then for one test we have Here XX is dot product of training vectorX and testing vector X. Generally, dot product of two vectors, a scalar FIG. 12. The histogram (in probability density form) of Z. To collect Z values, a digit-6 image is tested many times on each of the ten trained neural networks. All connections are equipped with λ T . Z 66 of positive testing is in red, and the other nine Z k6 of negative testing, where k=0, 1, 2, 3, 4, 5, 7, 8, 9, is in gray. Z's sample mean for each digit is depicted as vertical dotted line.
value, is essentially a measure of similarity between the vectors. The bigger E[ZX X ] is, the more intensely neural network with memory of trainingX responses to testing X, and the more similarX and X are to each other. Therefore, Eq. (4) simply links otherwise unrelated neural response intensity and stimulus similarity. Comparing ten E[ZX X ] values, we can tell whichX is the most similar to X and hence which digit is classification target. Only, ZX X value from test actually deviates around true E[ZX X ] randomly, which makes it useable and yet unreliable classification criteria.
When the classifier equips all connections with thresholdlike λ T in FIG 7, classification accuracy raises to ∼44%. By comparing FIG 15(b) with FIG 15(a), the distance between Z 66 's distribution and the other nine Z k6,k =6 's distribution is bigger with threshold-like λ T than with linear-like λ L . This accuracy improvement can be explained conveniently with a FIG. 14. These two neural networks inherit the sensor-cluster structure of FIG 10. (a) Each sensor neuron connects to one single cluster neuron such that each pixel stimulus x i only affects one single connection. Then s + i =θ i (x i ). By Eq. (2) and Eq. (3), we have Each sensor neuron connects to one cluster of many neurons such that each pixel stimulus x i can affect more than one synaptic connection. Let testing x i cause w i synaptic connections to be propagated with probability x i s + i in a test or none with probability 1−x i s + i , and for our purpose let w i be determined by training stimulusx i such that true threshold function By Eq. (4), of the sum terms in ∑ 64 x ixi , θ t basically diminishes smallx i ∈[0, x t ) to 0 and enhances bigx i ∈[  FIG 15(c) the distance between the distribution of Z 66 and Z k6,k =6 is increased compared to FIG 15(a). Here our neurobiological interpretation regarding ω i (x i ) is that, training stimulus affects not only synaptic strength, but also the growth of neuron cluster in terms of replication of neuron cells and formation of new synaptic connections.
And TABLE I shows the performance of our classifiers with the four typical λ functions in FIG 3, to demonstrate how "pathological" target strength functions affect classification.
As mentioned, handwritten digit images classification could be simplified to a task of generic linear classification: given ten classes each with its discriminative function δ i (X)=X i X, image X is classified to the class with the largest 14% 16% 19% λ (y)=0.5sin(4πy)+0.5 5% a 6% 2% λ (y)=−y+1 4% 5% 1% b Discontinuous λ 23% 20% 28% a Accuracy under 10% is actually worse than wild guessing. b When classification criteria is changed to "which neural network gets the least synaptic connections propagated", accuracy is ∼40%.
δ i value. Our neural classifier simply takes over the computation of vectors' dot productX i X and adds randomness to the ten results. To parameterizeX i in ten δ i , "supervisors" could train neural networks in the classifier with the images they deem the best -"average images" in our case or digits learning cards in teachers' case. Our neural classifier is rather unreliable and primitive compared to ANN which is also capable of linear classification. On one hand, given the same image ANN always outputs the same prediction result. On the other hand, ANN is not only a classifier but also more importantly a "learner", which learns from all kinds of handwritten digits to find the optimalX i for ten δ i ; OptimalX minimize error of misclassification, making ANN more tolerant with poor handwriting and thus have better prediction accuracy. Only, ANN's learning optimalX i , an optimization process of many iterations, requires massive computational power to carry out, which is unlikely to be provided by real-life nervous system -there is no evidence that individual neuron can even conduct basic arithmetic operation. Despite of its weakness, our neural classifier has merit in its biological nature: it reduces the computation of vectors' dot product to simple counting of synaptic connections propagated; its training and testing could be purely neurobiological development and activities where no arithmetic operation is involved; its classification criteria, i.e. "deciding" or "feeling" which neural (sub)network has the most connections propagated, could be an intrinsic capability of intelligent agents.

V. CONCLUSION
This paper proposes a mathematical theory to explain how memory forms and works. It all begins with synaptic plasticity. We find out that, synaptic plasticity is more than stimulus affecting synapses; it actually plays as a force that can drive neural network over time to a long-lasting state. We also find out that, under certain conditions there would be one-to-one mapping between the neural state and the stimulus that neural network is exposed to. With the mapping, given stimulus we know exactly what neural state will be; given neural state we know precisely what stimulus has been. The mapping truly is a link between past event and neural present; between the short-lived and the enduring. In that sense, the mapping itself is memory, or the mapping casts memory in neural network. Next, we study how memory affects neural network's response to stimulus. We find out that, neural network with memory of stimulus can response to similar stimulus more intensely than to stimulus of less similarity, if response intensity is evaluated by the number of synaptic connections propagated by impulses. That is, neural network with memory is able to classify stimulus. To verify this ability, we experiment with classifiers consisting of ten neural networks, and they turn out to have considerable accuracy in classifying handwritten digit images. Those classifiers prove that neurons could collectively provide fully neurobiological computation for classification.
Our reasoning takes root in the mathematical treatment of synaptic plasticity as target strength function λ from impulse frequency to synaptic strength. We put hypothetical constraints on this λ function to ensure that the ideal one-to-one mapping exists. Although these constraints are necessary to keep our theory mathematically sound, they raise concerns. Firstly, they could be overly restrictive. Take continuity constraint for example. Even the discontinuous function of FIG  3(d), whose nonexistent θ function would map certain stimulus to any point within "fixed interval" instead of a specific fixed point as shown in FIG 6, can be a useable λ in our classifier according to TABLE I. In this case, fixed point per se doesn't have to exist, and mere tendency to seek out for it could serve the purpose. Secondly, as discussed in Section II those λ constraints have yet to be supported by neurobiological evidences. Above all, evidences that reveal true λ is vital to clarify uncertainty.