Face Recognition in Complex Unconstrained Environment with An Enhanced WWN Algorithm

: Face recognition is one of the core and challenging issues in computer vision field. Compared to computer vision, human visual system can identify a target from complex backgrounds quickly and accurately. This paper proposes a new network model deriving from Where-What Networks (WWNs), which can approximately simulate the information processing pathways (i.e., dorsal pathway and ventral pathway) of human visual cortex and recognize different types of faces with different locations and sizes in complex background. To enhance the recognition performance, synapse maintenance mechanism and neuron regenesis mechanism are both introduced. Synapse maintenance is used to reduce the background interference while neuron regenesis mechanism is introduced to regulate the neuron resource dynamically to improve the network usage efficiency. Experiments have been conducted on human face images of 5 types, 11 sizes, and 225 locations in complex backgrounds. Experiment results demonstrate that the proposed WWN model can basically learn three concepts (type, location and size) simultaneously. The experiment results also show the advantages of the enhanced WWN-7 model for face recognition in comparison with several existing methods.


Introduction
Biometric recognition can demonstrate an individual's identity automatically based on one's anatomical and behavioral characteristics [39]. With the development of artificial intelligence, computer vision, cognitive science and psychology, biometric recognition method has become an important technique in national safety and public security, due to its convenience and non-intrusiveness. It has been widely used in public security, financial system, intelligent surveillance, information security, civil aviation and military security.
Compared with other biometric features, such as fingerprints, palms, irises, voice, gait, and ears, human face has many distinct advantages [22]. Even at a long distance, face characteristics can be extracted from camera images, which provide a convenient and non-intrusive way to monitor remotely. Moreover, the face also has richer structure and larger area than other biometric features so that the face region is not easily occluded. Hence, face recognition has become an indispensable biological authentication method and has drawn much attention of scholars over the past decades in various domains [17].
Performance of the face recognition is affected by many factors, such as lighting condition, pose variation, occlusion, images with low resolution and complex background [6]. So the face recognition under un-

Modeling
This section presents the improved WWN model for face recognition in unconstrained environment. The WWN-7 model, which is the basis of the proposed model, is described in the beginning and then its associated algorithm is given. Subsequently, the synapse maintenance and the neuron regenesis are introduced for improving the recognition performance.

The WWN-7 model
Where-What Networks model [14] is an embodiment of the developmental network model [31,34], whose structure is shown in Figure 1. It is composed of three areas: X area, Y area and Z area.  As the sensor, X area is the retina and the perception part of the agent, responsible for external information input. The entire Y area, as the "brain" of the agent, is enclosed in the skull. Y area is not under the direct supervision of the outside world and it realizes the information exchange between X area and Z area, just like a "bridge". It is divided into four parts: Y 1 , Y 2 , Y 3 and Y 4 , with different neuron numbers. The preprocessing area is responsible for calculating the pre-action energy of neurons in the learning process to determine which neuron can fire (be activated). Z area is the motor terminal of the agent and it is divided into three parts: LM (location-motor), SM (size-motor) and TM (type-motor). LM and TM simulate the dorsal pathway and ventral pathway in human visual system, respectively.
In the WWN-7 model, X area unidirectionally inputs the images to Y area, and this implies a bottom-up connection, while the connections between Y area and Z area are bidirectional. The top-down connections indicate that the motor concepts of Z area can guide the brain to learn. In Y area, its bottom-up connections feed forward the image understanding of the network (agent) to the motor terminal.
(1) Connecting modes among neurons Three connecting modes exist in the WWN-7 model, i.e., the bottom-up connection, the top-down connection and lateral connection. The information transmissions from X area to Y area and from Y area to Z area belong to the bottom-up connections, while the transmission from Z area to Y area means the top-down connection. There exist two modes of the lateral connection among neurons in the same area, i.e., activation and inhibition. The lateral connections exist in Y and Z areas, as shown in Figure 2, which is a simplified connection diagram of the WWN-7.

Bottom-up
Top-down Bottom-up Lateral (2) Receptive fields The corresponding neurons in X area and Y area are locally connected. Y area receives the top-down input and bottom-up input simultaneously. Only the connections between X area and Y area, i.e., the bottom-up input, are discussed as an example in this section. The WWN-7 model puts the images apperceived by X area into Y area to operate directly. According to the location, each neuron in Y area is connected to the specific neuron in X area. In human visual system, inputs in early processing area can decide the receptive fields in the visual cortex region. Neuro-anatomical studies [13,26] also confirmed that some neurons in the anterior part of the cerebrum had small receptive fields while some neurons in the later processing area had large receptive fields. From Figure 1, we can see that the four parts in Y area have different neurons, and sizes of the receptive fields increase gradually from Y 1 to Y 4 , i.e., 7×7 pixels, 11×11 pixels, 15×15 pixels and 19×19 pixels, respectively. Shapes of the receptive fields are all square. Each receptive field starts from the upper left corner of the image and turns to the right and downward by a pixel, respectively. Information obtained from the receptive fields by each shift will be input to a corresponding neuron in Y area. This process continues until the total image has been input.
(3) Effective fields WWN-7 has three motor areas: TM, LM and SM, which characterize the behaviors of the network. In the network learning process, the motor areas provide the agent with the location, type and size as guidance. According to the neuron activation, these concepts are related to the characteristics of the foreground input by X area to the agent. If the bottom-up inputs and the top-down inputs match well, the neurons in Y area that associate the characteristics with the concepts are likely to win in the top-k competition. Based on this, the teacher sets up the inputs of the corresponding motor concepts, i.e., the top-down inputs, according to the foreground information. Such inputs will enable the neurons in Y area to pay more attention to the foreground, thus the neurons in Y area which associate the characteristics with the concepts are more likely to fire constantly. Since the background is not related to the concepts of TM, LM or SM, only a small number of neurons in Y area will learn the background, which is beneficial to the foreground identification from the complex background.

(4) Match of two inputs for the neurons in Y area
Each neuron in Y area has a weight vector v=(v b , v t ), corresponding to the area inputs (b, t ). b and t denote the bottom-up and top-down inputs, respectively. Calculation of the pre-action energy of a neuron in Y area is depicted as follows: wherev is the unit vector of the normalized synaptic vector v=(v b ,v t ), andṗ is the unit vector of the normalized input vector p=(ḃ,ṫ). The inner product in the formula (1) evaluates the match degree betweenv andṗ, because r(v b , b, v t , t) = cos(θ), where θ represents the angle between the two unit vectorsv andṗ. When one neuron in Y area can find a good match vector from the input vectorṗ, e.g., match from bottom to top, i.e.,v b ·ḃ, and the match from top to bottom, i.e.,v t ·ṫ, the corresponding neuron will fire and be updated. It means that the neurons learned the background have a small chance to fire because their topdown matches cannot be the best.
All the neurons in Y area compete to fire through the top-k competition mechanism, which can be described as follows: where the subscripts 1, q and k+1 indicate the position ranking list of the pre-action energy in descending order.
In the paper, only k=1 is considered, so the winner neuron j can be identified by the following calculation: where c is the neuron number in the corresponding area.

Learning Algorithm: Lobe Component Analysis
Lobe component analysis (LCA) [13] is used as the learning algorithm in the WWNs model. In terms of the individual development, individual difference will lead to different learning rates. Because of the different learning rates, just as we teach students in accordance with their aptitudes, appropriate mechanism of the network should develop according to different individuals and each neuron can achieve the best learning rate. So it is very important to design a reasonable mechanism of the learning rate on the basis of temporal optimality. WWNs model is a neural network to roughly simulate the visual processing pathway of the human being in this work. The learning process is not once and for all, but a cycle of progressive process. In the learning process, the new knowledge will be learned, and the knowledge that is not used will be forgotten gradually. The repetitive learning process can enhance the memory of the agent. Similarly, the learning rate of WWNs is designed to reflect these ideas which are embodied in the following weight updating formulas: {︃ ω 2 (n) = 1+µ(n) n ω 1 (n) + ω 2 (n) ≡ 1 where ω 1 (n j ) and ω 2 (n j ) are the learning rate and retention rate, respectively, both depending on the firing age n j of neuron j. The retention rate ω 1 (n j ) consolidates the knowledge that the neurons have learned. In the learning rate ω 2 (n j ), the amnesic mechanism µ(n) is introduced, where m=2, r=10000, t 1 =10, t 2 =30 (based on the literature [13,33]). In this way, we can dynamically adjust the learning rate and the retention rate in the weight updating process. An appropriate retention factor is set to make the network learn fast in early development stage. When it comes to a certain degree and the neuron age is large enough, the learning rate tends to be a constant gradually. So this mechanism can keep learning new knowledge persistently, and forgetting the knowledge less used.

Synapse Maintenance Mechanism
In this paper, the faces in the complex backgrounds have different types, locations and sizes. Especially in the case of different receptive fields, the backgrounds have great impacts on the detection of the face features. Therefore, it is able to estimate the standard deviation of each pixel in the input from X area to Y area by using the synapse maintenance mechanism (SMM) [31]. In training process, these pixels whose standard deviations vary greatly (corresponding to the background) will be suppressed, which will reduce the influence on the calculation of the pre-action energy of the neuron in Y area.
Receptive fields in computer vision are generally regular geometry, while the object outlines in real world are arbitrary. Therefore, the receptive field often contains partial background. In WWN-7, receptive field of the neuron in Y area is also a regular square. Obviously, the background will interfere the recognition result of the foreground. So the SMM is introduced into the WWN model to solve this problem. The SMM is expected to recognize the foreground and background in the receptive field. After the foreground outline is sought out, the background can be easily suppressed. It works to estimate the standard deviation of each pixel in the receptive field, through restraining the pixel with large standard deviation, to decrease the influence of the unstable pixel on the pre-action energy of the neuron. So this mechanism reduces the influence of the background on the neuron activation.
Supposing the input of Y neuron is p = (p 1 , p 2 , . . . , p d ) and the value of a neuron in Y area is , the matching degree between v i and p i can be calculated as follows: where σ i is used to evaluate the matching degree of the input and the value of the neuron in Y area. According to the matching degree σ i , each neuron decides to extract or retract the synapse dynamically. In Hebbian algorithm, v i is the good estimate of the amount of p i and the expected value of p i . If v i is estimated well, σ i can be seen as the standard deviation of the input pixel. So the brightness dispersion of the input pixel in the same position is estimated by σ i . A higher σ i indicates a higher dispersion degree, meanwhile, the input pixel is more unstable and the synapse should be retracted. Now a given threshold is set to determine the synapse state. But the synapse will extend or retract repeatedly when the σ i approaches the threshold we set. So the smooth synaptogenetic factor f i is added in the following formula: where ε is an infinitesimal quantity, used to avoid the zero denominator. f i will be normalized to prevent itself from being too large.
The synaptogenetic factor f i reflects the connecting strength between the pixel and the corresponding neuron. Range of the synaptogenetic factor f i is between 0 and 1. If f i is equal to 1, the corresponding pixel is completely connected, while 0 means completely suppressed.
Synapse maintenance requires pruning the input p and the value v as follows: Then p ′ i and value v ′ i are used to calculate the pre-action energy of the neuron in Y area. Interested readers are suggested to refer the [30] for more concrete description of the SMM.

Neuron Regenesis Mechanism
Hebbian learning changes the synaptic weights between neurons, while neuron regenesis mechanism makes the inactive neurons have the chance to fire (meaning "new birth, renewal") and make new connections with other neurons. Firing frequency is used to evaluate the neuron activation degree. If the firing frequency is low, the corresponding neurons are regarded not active, i.e., few winning times in competition. Therefore, they are possibly suppressed by their surrounding active neurons, and can't learn new knowledge about the foreground. If these low level neurons don't have a chance to regenerate and learn new features, it is a great waste of the neuron resources. So the neuron regenesis mechanism is designed to regulate the neuron resources, hence promote the WWNs performance.
At time t, the firing frequency is defined as follows: where f (t) is the firing frequency of the neuron, n(t) is the neuron age (fire times) and N(t) is the current running time of the network. Accordingly, at time t+1, the firing frequency can be defined: And the current running time of the network increases one unit: To sum up, the firing frequency of the neuron can be calculated in the recursive manner: Then, the neuron firing frequency of one neuron should be compared with its neighbors to determine which neuron should be regenerated. In 3D space, one neuron has a total of six nearby neurons with the distance equalling to 1. If a neuron competes to fire after winning, it should be compared with other six neurons. Suppose the neuron A is the winner and the neuron B is one of the six nearby neurons. The following criterion is adopted to compare the firing frequencies of neurons.
Only the active neurons at high active level have the ability to suppress other neurons, which means that "age" (denoting its firing times) of the neuron A is already very high. In the expression (15), n 0 is the "age" threshold, and in this work, it is set to 40, i.e., n 0 =40. In addition, the firing frequencies of the nearby neurons are also relatively low. In the paper, the firing frequencies of the nearby neurons should be less than 1/4 of that of the neuron A. If the criterion is met, the neuron B should die and then regenerate to learn new knowledge.

How the Y Neurons Learn the Object Features
Firings of all Y neurons go through top-k competition, and only the winners can fire and update their weights, i.e., object features they have learned. In the top-k competition, neuron active degree is evaluated by its preaction energy. Only the former k neurons with the largest pre-action energy can fire. The pre-action energy is composed of four parts: action value corresponding to the X input (bottom-up input), and action values corresponding to LM, SM and TM inputs (top-down inputs). Only when the four action values are large (match well) can the neuron fire. The firing neuron can learn the corresponding feature of the input image, and connect with the corresponding neurons in LM, SM and TM (supervised by the external environment), in other words, connecting the features with the concepts. After that, if this neuron wants to fire frequently (age increasing, more mature, learning more stable feature), it means that the feature in X layer and the corresponding concepts in LM, SM and TM must happen simultaneously, i.e., the feature in X layer is relevant with these concepts. In other words, if some feature in X layer has almost no relevance with the concepts in LM, SM and TM, this feature is difficult to be learned. Even the neuron corresponding to this feature fires a few times occasionally, its firing frequency will be not high. Since the concept is irrelevant to the feature, it cannot ensure that the four pre-action values are high. These neurons should die and regenerate through the neuron regenerate mechanism, to learn new features. Therefore, most neurons in WWNs can learn the foreground features, while few neurons can learn the background. Since compared with the foreground, the background has almost no relevance with the concepts denoted by LM, SM and TM which are provided by the external teacher.

Experiment Design and Analysis
As introduced in the previous section, the improved WWN-7 model will be designed to identify human faces in the complex natural backgrounds. Effects of synapse maintenance and neuron regenesis mechanism will be shown in the experiment. The internal weights will be analyzed to demonstrate the accuracy of face recognition with the proposed WWN-7 model. Comparative experiments will show the advantages of the proposed WWN-7 model for face recognition in comparison with several existing methods.

Map Library
Face pictures used in this paper are LFW Face Database, and the complex natural backgrounds used are selected randomly from the natural images whose dimension is 38×38 pixels, as shown in Figure 3.
Objects to be recognized are 5 different human faces and each face has 11 different sizes. So there are 55 types in total, and the face location numbers in the complex backgrounds change from 5×5 to 15×15, as shown in the Table 1. For the same face type, images with even sizes (24×24 pixels, 26×26 pixels, etc.) are used for training, while those with odd sizes (25×25 pixels, 27×27 pixels, etc.) are used for testing. There are 25 face images to be tested in experiment. For each face, it can appear all possible locations in the backgrounds. Thus the recognition rate can be calculated.
The network will be randomly assigned to a complex natural background first, then the foreground object is embedded in the background in the training phase. After the foreground objects at all locations have been trained, the WWN performs a whole train procedure.

Inputs and Outputs
Inputs from X to Y layer are introduced first. The connections between X and Y layer are local and the different receptive fields mean the different characteristic information. The network involves the receptive fields of 7×7 pixels, 11×11 pixels, 15×15 pixels and 19×19 pixels, corresponding to the four parts of Y layer, respectively. Taking the receptive field of 7×7 pixels as an example. The retina can only capture the square range of 7×7 pixels each time. Each receptive field starts from the top left corner of the image and moves down by a pixel each time. When it moves to the bottom of the image, the receptive field moves to right side by one pixel and starts to scan from the second column. This process will be carried out until the entire image has been input. The scanned data are stored in a sequential order, so that the inputs of the first part of the Y layer are obtained. Other receptive fields can replace the former one to repeat this process and the inputs of the remaining three parts of Y layer will be obtained according to the same steps.
There are some independent inputs from the three parts of Z layer to Y layer. Because of the five different human faces, input dimension of TM is 5×1. Due to the 11 different sizes of the face images, input dimension of SM is 11×1. Location number must contain all the possible locations, so the input dimension of LM is 225×1, as shown in Table 2.
Above sections illustrate that different receptive fields correspond to the different parts of Y layer. Taking the receptive field of 7×7 pixels as an example. There are 9 sub-layers in Y 1 and the input image dimension is 38×38 pixels. Each small scanning frame corresponds to a neuron, so there are (38-7+1)×(38-7+1)×9=32×32×9 neurons in Y 1 sub-layer. In the similar way, the neuron numbers in Y 2 , Y 3 and Y 4 sub-layers can be obtained and showed in Table 1. Y layer receives the inputs from X layer and the three kinds of inputs (type, size and location) from Z layer simultaneously. The pre-action energy of Y neuron indicates the matching degree between the input and its corresponding weight, which represents the memory of the neuron. The greater the pre-action energy is, the more consistent it is between the input and the memory information, and the easier it is for the neuron to fire. After the pre-action energy are obtained, neurons in the Y layer compete to get the output according to the top-k competition mechanism.

Calculation and Distribution of Neurons
Before calculating the pre-action energy, neuron input should be normalized. Variable input and weight represent the input and connection weight between layers, respectively, while epsilon represents a small positive number. The normalization process can be depicted as follows: (1) Search the minimum and maximum of each row to form a vector min and max, respectively; (2) Calculate the difference between the maximum and the minimum. To avoid the zero denominator, a small positive number epsilon is introduced: di =max-min+epsilon. (3) Execute the normalization process: input = (input-min)./di ; weight = (weight-min)./di .
After normalizing the inputs and weights, they are used to calculate the pre-action energy of the Y neurons in the next step.
Pre-action energy determines whether the neurons can fire. If it is high, the neuron will become the winner in the top-k competition and will fire, and updates its weight and age through the Hebbian learning mechanism.
Split the formula 1 into three parts and calculation of the pre-action energy of the Y neuron can be depicted as follows: where r b and r t are the pre-action value of the bottom-up connection and top-down connection, respectively. Dot above the vector represents the normalization of the vector. Subscript b and t represent the bottom-up input and top-down input, respectively. α refers to the ratio of r t to r. When the network is in training state, α=0.5. When the network is in testing state, α=0 and the pre-action energy of Y neuron is only determined by the bottom-up input. r t is made up of three parts: Z TM , Z LM and Z SM , and each of them accounts for 1/3. Then the achieved pre-action energy values are ranked and the neurons ranked in the former k (k can be selected according to the network situation, in this work, k=1) are activated. Energy value of the activated neuron is set to 1 and those of the rest neurons are all set to zero.

Visualization of the Internal Weights
After the network is trained 20 times, Figure 4 visualizes the bottom-up weights of partial neurons in the four sub-layers of Y layer. Each small box in Figure 4 represents the bottom-up weight of a Y neuron and the feature that the neuron has learned. Each box is separated by a white line and the dimension represents the size of the corresponding receptive field. The black box represents the dormant neuron which is not activated and is in the initial state. In the small receptive field (i.e., 7×7) of Y 1 , the faces with different sizes have more similar characteristics, so more characteristics are cumulative. In this receptive field, most features that the neurons have learned are about the object margin orientation. While in the largest receptive field (i.e., 19×19) of Y 4 , D. Wang et al. local or global features learned can be mainly identified. Figure 4 indicates that most neurons have learned the face characteristics.
Furthermore, the receptive field of Y 4 is visualized in Figure 5. Sub-figures a, b, and c represent the topdown weights of the type, location and size, respectively. The weights are normalized and they are separated by white lines. Weight dimension of single neuron in sub-figure a and b is 5×1 and 11×1, respectively, which cannot deform into a square. So the box with full zeros is set up and the weight of a single neuron is placed into it to display. Weight dimension of a single neuron in sub-figure c is 225×1, which can deform into a square with 15×15. The white spots in different positions represent the different activated neurons. Visualizations of the synaptogenic factors, and standard deviations of the synapse matching are shown in Figure 6 and each box in the figure represents a neuron. In sub-figure a, foreground contour has been automatically formed by the synapse maintenance mechanism. In sub-figure b, standard deviations of the synapse matching are constantly updated in accordance with the Hebbian learning. Black box represents the dormant neuron which is not activated.

Effect Demonstration of Neuron Regenesis
WWN-7 is "skull-closed", with the autonomous developmental ability, so the network internal resource can be regulated dynamically in the network learning. Here, resource dynamic regulation denotes that the connection among neurons can be built up or cut off dynamically. WWN-7 has the autonomous developmental ability to strengthen the right knowledge learned and weaken the false cognition, through continuous learning. Figure 7 visualizes the bottom-up weights of Y neurons corresponding to the receptive field with 19×19 pixels, in different training epoches. One training epoch means that WWN has learned all training foreground images in all possible locations in the complex background. The top left sub-image (epoch=0) shows the initial state of the network, and there are no connections among the neurons, which means that the current WWN has strong ability to learn new knowledge. Just like a newborn baby, it has strong plasticity. The top right sub-image (epoch=1) displays the object characteristics that Y neurons have learned after the first network training epoch. We can see that some of the current characteristics are obscure, and it is difficult to distinguish them. In other words, the current WWN is not sure whether the characteristics learned is useful and should be kept. The bottom right sub-image (epoch=5) illustrates the weights after 5 training epochs, it is clear that partial characteristics are very obvious and a few neurons have regenerated to learn new features. Through the learning, the WWN has retained partial useful features, and the neuron regenesis mechanism has begun to regulate the neuron resource. With the network learning (epoch becomes larger), more and more neurons enter learning state and learn the object features. The bottom left sub-image (epoch=10) shows the weights after 10 network training epochs, we can see that most features are very obvious, and the neurons have been activated efficiently to learn knowledge, and the suppressed neurons have been reactivated to learn new features. Just like the human brain can produce new neurons to learn new knowledge, through the neuron regenesis mechanism, the WWN can reactivate the suppressed neurons to regenerate and learn new knowledge. So in the limited resource space (i.e., fixed neuron number), WWN can carry out efficient and accurate learning.
Furthermore, to display the effect of the neuron regenesis, the content in red box in Figure 8 is amplified. The sub- figure (a) shows that, after one training epoch (i.e., epoch=1), the features that the neurons in the right top corner learned contain parts of foreground features and partial backgrounds. Since the background has little relation with the TM, LM and SM in Z layer, the fire frequencies of these neurons are not high. Through the neuron regenesis mechanism, these neurons will die and regenerate to learn the foreground. Sub-figure (b) displays that after 5 training epochs (i.e., epoch=5), these regenerating neurons have learned new face characteristics which contain most part of the foreground and little background.   Figure 9 illustrate that, after 1 training epoch and 5 training epoches, the neurons in the corresponding red box are in the learning state, but the foreground features they learned have not changed, and several neighbouring neurons learn the same or similar foreground characteristics. Sub-figure (c) shows that after 10 epochs, face features that the neurons in corresponding location learned have changed, and the neighbouring neurons have begun to learn different face features which means these neurons have experienced the procedure from dying to regenerating, and learned new foreground features. Figures 8 and 9 illustrate the dynamic regulation of the neuron regenesis mechanism for the network internal resource: it not only makes the neurons that have learned the useless background information to regenerate and learn new foreground features (denotes in Figure 8), but also let the neurons with low fire frequencies in the neighbouring neurons learned the same or similar foreground features to regenerate and learn other foreground features, thus more foreground features can be learned (denotes in Figure 9). Therefore, the neuron regenesis mechanism is a good supplement to the Hebbian learning.

Performance Evaluation
Since the WWN-7 can learn three types of features: type, location and size, therefore, performance of WWN-7 can be evaluated from the recognition results of these three aspects. The recognition rate of the type is the proportion of the test images which are correctly identified. The recognition error of the location is the Euclidean distance between the recognition location and the real location. Recognition error of the size is the Euclidean distance between the recognition dimension and the real dimension. Recognition error of the location and size is the average of all the recognition errors of the test images. The performance of WWN-7 can be discussed from three situations: WWN-7 without the synapse maintenance and neuron regenesis, WWN-7 with the synapse maintenance only, and WWN-7 with both synapse maintenance and neuron regenesis. The diagrams of the recognition effect are shown in Figure 10. Figure 10(a) shows that the recognition rate of the WWN-7 without the synapse maintenance and neuron regenesis mechanism can be 98.1%, while that of the WWN-7 with these two biological mechanisms can reach 99.6%. Performance of the WWN-7 with these two mechanisms is better than that of only with the synapse maintenance. But during epoch=4 and epoch=9, performance of the former is worse than that of the latter, resulting from the fact that during the early time of the neuron regenesis, the neurons that have learned part of foreground characteristics and partial backgrounds will die. And the connections between these neurons are cut off totally. Their ages are set to 1 again, and their weights return to initial states. Before these neurons learn the new features and set up stable connections, they are not conducive to the performance of WWN-7. But with the training progress, these neurons learn more stable foreground features quickly, so its total performance is better. Figure 10(b) displays that location error of the WWN-7 with these two mechanisms is the smallest among the three cases, i.e., 1 pixel. In face images, the foreground objects (faces) have little difference, while the backgrounds are randomly added from natural images which bring great difference. With the synapse maintenance mechanism, the background interference can be suppressed effectively, so it is a great help to identify the types and detect the locations. Similarly, Figure 10(c) illustrates that the size (or dimension) error of the WWN-7 with these two mechanisms is the smallest among the three cases which is close to 1 pixel. As previously mentioned, in training phase, face images with even size are trained, while in testing phase, odd size are adopted. There exists one pixel difference between the even size (e.g., 24×24 pixels) and the corresponding odd size (e.g., 25×25 pixels), so the best result of the dimension error is close to 1 pixel.
To sum up, WWN-7 with the synapse maintenance significantly improves the face recognition effect, compared with that of without the synapse maintenance mechanism. If it is added with the neuron regenesis mechanism, its recognition performance can be further promoted but the promotion is not very obvious.

Performance Comparison
To demonstrate the performance of WWN-7 on face recognition further, 3 classical methods: PCA+third-order nearest neighbor algorithm [12], the BP neural network [38], robust sparse representation algorithm [7] are employed to compare the effect with the WWN-7 model. In above experiment, performance of WWN-7 is evaluated from three indexes: the recognition rate of types, the identification error of locations and the sizes. However, these classical methods can only classify the face images, while the location and size of the face can't be identified. Therefore, in this comparative experiment, only the recognition rates of the type are evaluated. The recognition results are provided in Table 2 to Table 6. The PCA dimension and the feature dimension of sparse representation in Table 2 and Table 4 are the dimensions of feature vector after extracting features.     In the comparative experiments, for the BP network, the neuron numbers of input layer, hidden layer and output layer are 5, 10 and 5, respectively. Each person has 11 face images, 6 images are used to train the network and 5 images are used to test (the training set and test set are the same as those used in the WWN-7, respectively). Hidden layer neurons use 'tansig' transfer function, and output layer neurons adopt 'purelin' transfer function. Network train function employs the 'traingdx', and its weights and thresholds are automatically initialized. Network training performance function adopts mean squared error (mse), when the mse=0.001, the network stops. According to the Table 2 and 4, PCA dimension adopts 140, and the feature dimension of sparse representation uses 150. To avoid the random error, the same recognition procedure is done 30 times with the 6 methods, thus to calculate the average, standard deviation, minimal and maximal value of the recognition rate. The final results are shown in Table 6. Table 6 displays that the recognition rates of the former three classical methods are lower than those of the WWN-7, which result from two reasons: the first reason is the interference of the different backgrounds, and the second one is the location and size variation of the foreground. Large standard deviations imply that the former three methods cannot well adapt the changes of location and size of the foreground in complex background. Due to its biomimetic mechanism, WWN-7 with the synapse maintenance mechanism can seek out the foreground outline to recognize the foreground and the parts of the background in the receptive field. Moreover, the neuron regenesis mechanism can activate the suppressed neurons in the competition to regenerate and learn new foreground features, so the recognition rate of WWN-7 with the two mechanisms is higher than other approaches. These comparative experiments indicate that WWN-7 can well realize the recognition and detection of the types of human faces in complex backgrounds.
Finally, we compared the location error and size error between our work and the standard WWN-7 [35]. The results are provided in Table 7 and 8. From the Table 7 and 8, we can see that the improved WWN-7 model including the synapse maintenance mechanism and the neuron regenesis mechanism can effectively promote the recognition rate of the location and size of the foreground in complex backgrounds. For these two mechanisms, the synapse maintenance mechanism contributes to the performance improvements more than the neuron regenesis mechanism.

Computational Complexity
The previous part compares the performance of different methods on the face recognition, in this part, we theoretically compare the computational complexity of theses methods.
Since the WWN is a typical feed-forward network, and the computational complexity of common feedforward network is O(n4) [9], where n denotes the number of neurons in each layer. Considering the number of layers in WWN is very small, so we can neglect its influence on the computation complexity, thus we can achieve the computational complexity of WWN, i.e., O(n 3 ). Similarly, we can get the computational complexity of BP neural network, i.e., O(n 3 ). While that of the PCA and robust sparse representation algorithm is O(n), where n represents the data dimension.
Based on the above discussion, we can approximately assess the computational complexity of the three approaches. For the same learning task, WWN and BP neural network have the highest computation complexity, which means a longer operation time. While PCA and robust sparse representation algorithm has the lower one, so their computation time is relatively little. Although WWN has the highest recognition rate among the three methods, how to design better network architecture to enhance its computation speed is an important research direction.

Conclusions and Future Work
This paper proposes an improved Where-What Networks (WWN) model that simulates the working mechanism of the human brain visual system for identifying human faces in complex background. The main internal mechanisms of Where-What Networks, e.g., Hebbian learning rule, receptive fields, top-k competition, update rules, and so on, are explained detailedly in the work. To study the internal mechanism of the WWN-7 further, the bottom-up and top-down weights are visualized in the experiment. Based on the WWN-7 model, synapse maintenance is used for reducing the background interference while neuron regenesis mechanism is designed to regulate the neuron resource dynamically for enhancing the network usage efficiency. The performance of the WWN-7 with (without) the synapse maintenance and neuron regenesis mechanism for the face recognition in complex backgrounds are studied, and influences of synapse maintenance and neuron regenesis mechanism are analyzed. Experiment results indicate that the average recognition rate of the improve model reaches 99.6%, the location errors and the size errors are reduced by 18% and 24% when these two mechanisms are both considered. Although this model has achieved good performance, it is only for recognizing different types of faces with different locations and sizes in complex background, without considering the angle of the faces, occlusion, which will affect the final recognition rate. Moreover, its high computation complexity is also a problem to be solved.
Future research will consider a more detailed model to simulate the human brain visual system which is extremely complex, and it will extend the current work which considers the dorsal pathway and ventral pathway only. The objects to be recognized should not be limited to the static images but extended to the moving objects in video.