Distributed Face Recognition in Wireless Sensor Networks

As one of the most successful applications of image analysis and understanding, face recognition has recently received significant attention, especially during the past few years. In order to construct an autonomous and robust biometric security system, this paper explores the application of face recognition technique in wireless sensor networks. Given the limited technological resources of sensor nodes, new challenges remain to be met. In this work, a facial component-based recognition mechanism is firstly applied to ensure the recognition accuracy. Secondly, in order to address the problem of resource constraints, a distributed scheme based on K-d trees is deployed for both the face image transmission and retrieval. According to the simulation results, the proposed method is capable of achieving considerable energy efficiency, while assuring the recognition accuracy.


Introduction
As one of the most important biometric techniques, face recognition (FR) has clear advantages of being natural and passive over other biometric techniques requiring cooperative subjects, such as fingerprint recognition and iris recognition [1]. A normal framework of FR system is shown in Figure 1, including procedures of enrollment and identification. It has been widely used in access control, identification systems, and surveillance [2]. Nevertheless, in most FR systems, the biometric information is stored remotely in a central database, and the identification device communicates with the database based on traditional wired network [3,4]. To enhance the flexibility of FR systems, a lot of recent research work has explored the implementation of FR systems in the wireless network. For example, Zaeri et al. propose to apply face recognition for wireless surveillance systems [5]. Kim et al. implement a wireless face recognition system based on ZigBee protocol and principle components analysis (PCA) method with low energy consumption [6]. Chang and Aghajan focus on recovering face orientation for more robust face recognition in wireless image sensor networks [7]. Muraleedharan et al. propose to use a specific evolutionary algorithm to optimize routing in distributed time varying network for face recognition [8]. A face recognition system gains flexibility and cost efficiency while being integrated into a wireless network. Meanwhile, the face recognition technique enhances the functionality and security of the wireless network [9].
In this work, a wireless dimension is added to FR systems by combining with wireless sensor networks (WSNs) [10], which is another hot research topic in recent years. WSNs have gained rapid development recently, particularly with the proliferation in microelectromechanical systems (MEMS) technology which has facilitated the development of smart sensors [10,11]. A WSN typically consists of a number of sensor nodes (few tens to thousands) working together to monitor a region to obtain data about the environment [11]. Wireless sensor devices extend the functionality and capabilities of a static FR system by allowing users to capture facial images out in the field and compare images against remotely centralized biometric matching systems. For example, an automatic surveillance system works independently and sends out alerts whenever some intruder appears in vision [4]. Sensor networks with self-organizing techniques that optimize nodes based on their capabilities and energy capacities are best suited for deployment in remote areas far from central database. Power efficiency and optimization, power scavenging, are the only approaches viable in such an environment. Although WSNs based face recognition is a promising technique for robust and autonomous biometric security system, new challenges in realizing WSNs based FR must be taken into consideration. Firstly, a wireless face recognition system requires considerable energy, computation, and bandwidth for image acquisition, processing, and transmission. The lifetime of a wireless network is severely reduced due to the heavy burden of image processing and communication [12]. Secondly, the face recognition techniques need to be robust and accurate, due to wireless transmission of raw or compressed data. To address the issues stated above, a robust and efficient WSNs based FR technique is proposed in this work. The main contributions of this paper are twofold. Firstly, a robust facial components extraction method is implemented by combining response of local feature detector and global geometrical constraint. Secondly, an energy efficient distributed facial images transmission and retrieval scheme for recognition is proposed. According to the experiment validation, the proposed system prototype shows promising improvement over centralized recognition on both recognition accuracy and energy efficiency.

Related Work
A lot of efforts have been made to improve the network performance and recognition rate for face recognition application in WSNs. Razzak et al. reengineer linear discriminant analysis to implement a collaborative face recognition system in WSNs [12]. Muraleedharan et al. propose to apply the contourlet and wavelet compression for in-network compression and utilize the cognitive routing protocol to construct a flexible and cost effective distributed routing mechanism to allocate limited resources thus ensuring longevity of the network [13]. Yan and Osadciw propose a wireless system model for distributed face identification, which consists of feature net and database net. The resources of the wireless network are optimally allocated according to the constraints and requirements of the specific network environment [14]. The common feature of these prior works is to fully exploit the distributed nature of wireless networks and efficiently allocate the network resources to achieve the energy efficiency.
What is more, a key issue for the success of a FR system is finding efficient descriptors for face appearance [2]. The face recognition for still images is categorized into two main groups: holistic and component based [15]. In holistic methods, a single feature vector that represents the whole face is used as input to a classifier. Different holistic methods such as principal component analysis (PCA) [16], linear discriminant analysis (LDA) [17], and the more recent 2D PCA [18] have been studied widely. But lately, it has been argued that the component-based method outperforms holistic methods which use a full representation of the face image for recognition [19,20]. In this category, a flexible geometrical relation between the different facial components in the classification stage is allowed to compensate for pose changes [18]. To further consider the distribution characteristics of WSNs, component-based recognition method is applied instead of full representations for face recognition in our work. Figure 2 shows the overview of our proposed system, including three main steps. Firstly, a face region in the incoming probe image is extracted, followed by facial components detection. Secondly, facial components are transmitted through sensor nodes in a distributed manner. Finally, the procedure of recognition is implemented in a distributive database structure based on K-d tree.

Facial Components Detection
3.1.1. Face Detection. To build a system capable of automatically identifying by face appearance, it is first necessary to localize the face in the image. We used OpenCV implementation of AdaBoost cascade detector following Viola and Jones' method [21], which consists of three parts. The first is an efficient method of encoding the image as an "integral image. " This allows the sum of pixel responses within a given subrectangle of an image to be computed quickly and is vital to the speed of the Viola-Jones detector. The second element is the application of a boosting algorithm known as AdaBoost [22] to select appropriate features that can form a template to model human face variation. The third part is a cascade of classifiers that speeds up the search by quickly eliminating unlikely face regions. The method requires a set of positive and negative image regions. For face detection we use 1,000 manually cropped face images as positive samples and 10,000 nonface images from INRIA dataset [23]. The detected face is further normalized by a transform that maps the positions of two eyes to canonical positions.

Facial Components Extraction.
The facial feature models are constructed using the same method as each individual level of the Viola-Jones detector cascade. For all of the training samples for individual facial parts, we rescale the images so that the faces have an interocular distance of roughly 55 pixels. Positive samples are taken at the manually annotated part locations. Negative samples are taken at least 1/4 of the interocular distance away from the annotated locations. In addition, random image plane rotations within ±15 ∘ are used to synthesize additional training samples. The problem with individual detectors is that, if applied independently, they often fail to provide an accurate estimate of the landmark positions. This can be resolved by using a prior on the shape configuration of landmarks. The detection is typically carried out in two consecutive procedures. Firstly, a set of candidate positions with large responses for each landmark are chosen separately. In the second step, the best landmark configuration with best match with the shape prior is selected. Following [24] we assume that after aligning the points into a common coordinate frame, the distribution is a multivariate Gaussian, the parameters of which can be estimated from the training set. Thus for any combination of feature candidates , we can measure the quality of a landmark configuration by a scoring function: where is the response for each individual facial feature and ( , ) denotes a Gaussian distribution for with mean location coordinates and covariance . denotes the total number of facial components to be extracted. For our system, = 5, which means that 5 facial features (two eyes, nose tip, and two mouth corners) are located on the face. To tolerate some degrees of errors in the component localization, a 5 × 7 grid is defined at each detected component [25], thus leading to 175 grid points from five components. In order to further enforce geometric constraints among features, we assign each grid a unique ID, which is called "Feature ID, " as shown in Figure 3.

Distributed Transmission Scheme.
In order to carry out face recognition efficiently in WSNs, the resources should be optimally allocated within the distributed network, including sensor energy, transmission bandwidth, memory, and the processing ability. In this work, inspired by the scheme of parallel processing, a distributed transmission method is proposed in the system. The topology of the transmission network is shown in Figure 4, which includes source node capturing the probe face image, FCN nodes which will be explained below, sink node, and other idle nodes. In this work, we consider sensor networks where all sensor nodes in the network are homogeneous and energy constrained and do not change their locations after deployment.
After the stage of face detection and facial components extraction, the features are transmitted in parallel to multiple neighboring nodes around source. The choice of cluster for related neighboring nodes is determined by some energy preserving protocols, such as LEACH [25]. The net of nodes storing detected facial components is named as facial component net (FCN). Depending on the capacity, either one or

K-d Tree Construction
Input: A set of gallery vocabularies, represented by vectors { } ∈ R . Output: A set of virtual links between database nodes in the form of binary K-d Trees { }. Each internal node has a split (dim, val) pair where dim is the dimension of visual vocabulary to split on and val is the threshold such that all points in the feature space with [dim] ≤ val belong to the left child and the rest belong to the right child. The leaf nodes have a list of indices to the features that ended up in that node. Operation: For each tree : (1) Assign all the points { } to the root sink node; (2) For every node in the tree visited in Breadth-First order, compute the split as follows: a few features are stored in one node. The nodes in FCN transfer the local features from raw color space to specific feature space. In this work, local binary pattern (LBP) [26] is used as the descriptors for the local facial components' features. According to [26], LBP has been proven to be highly discriminative, invariant to lighting variation, and computationally efficient for the task of face recognition. In our work, the LBP operator is conducted on the 3 × 3 neighbour of each grid point around the detected face components. Then through transmission in FCN, the feature descriptors for different facial parts are converged into the sink node, which do further processing before transmitting to the nodes in database net (DN) for recognition. In the sink node, each facial image is represented by concatenating the histograms of occurrences of quantized LBP descriptors in the order of Feature ID, forming the probe vocabulary. The probe vocabulary is further transmitted to DN for matching against the gallery images in the database. For all the gallery images in the database, the same five facial components are labeled by hand and undergo the same process. To reduce the processing and memory overheads, the enrollment of faces is performed offline and templates are stored in database nodes.

Distributed Face Recognition.
With the gallery and probe vocabulary, the recognition is performed by the nearest neighbor search using K-d trees [27]: where ⃗ V, ⃗ V denote gallery and probe vocabularies, respectively, and V respects the total length of the vocabulary vectors.
Taking into consideration the distribution characteristic in WSNs, instead of centralized localization, the gallery images are distributed over all database nodes in the manner of a virtual K-d tree structure, which means that the data transmission among nodes is logically organized in the same way as K-d tree. The algorithms for constructing K-d tree and face image searching are described in detail in Algorithm 1 and Table 1.
In this work, two ways to parallelize K-d trees are explored, namely, Independent K-d Trees (IK-dT) and Distributed K-d Trees (DK-dT), which will be described in detail as follows.
(1) IK-dT. The simplest way of parallelization is to divide the image database into subsets, where each subset can fit in the memory of one node. Then each node builds an independent K-d tree for its subset of images. A single root sink node accepts the query image and passes the query features to International Journal of Distributed Sensor Networks 5  all the nodes in DN, which then query their own K-d tree.
The root node then collects the results, counts the candidate matches, and outputs the final matched image, as is illustrated by Figure 5.
(2) DK-dT. Build a single K-d tree for the entire galley database, where the top of the tree resides on a single node, the root node. The bottom part of the tree is divided among a number of leaf nodes, which also store the features that end up in leaves in these parts. At query time, the root nodes direct the search to a subset of the leaf nodes depending on where features exit the tree on the root node. The leaf nodes compute the nearest neighbors within their subtree and send them back to the root node, which performs the counting and outputs the final result with highest matching score, as is illustrated by Figure 6. The most obvious advantage of DK-dT is that a single feature will only go to a small subset of the leaf nodes in the whole net, and thus the ensemble of leaf nodes may search simultaneously for the matches of multiple features at the same time. This is justified by the fact that most of the computation is performed in the leaf nodes.
To further analyze the energy efficiency of the DK-dT, we use the energy model following [28] in this work. To transmit a -bit data to a distance , the radio expends energy: The first item denotes the energy consumption of radio dissipation, while the second denotes the energy consumption for amplifying radio. Depending on the distance between the transmitter and receiver, both the free space fs ( 2 power loss) and the multipath fading mp ( 4 power loss) channel models are used. When receiving this data, the radio expends energy: Here, we focus on the energy consumption at the receiver end: According to [29], the search complexity in a distributed K-d tree is (log ), where means the total subjects in the gallery database. So (5) can be converted to while in independent K-d, the search complexity is ⋅ (log ), where is the total subset number of gallery databases. So, For a given probe image, the term in (5) is a constant. So the energy efficiency of DK-dT is superior to the alternative IK-dT.

Experiment
To validate the feasibility of the proposed system, we simulate a network with 200 nodes in a play field of 100 m × 100 m, including 150 nodes in the facial components network and the other 50 in the database network. Additionally, 500 subjects in FERET dataset [30] are selected for the task of face recognition. For each subject, one frontal image with regular expression is chosen as the gallery image, one image with alternative expression as the probe image.

Components Based Method versus Full Representation.
In the first experiment, to demonstrate the effect of components based method for face recognition, we compare the recognition accuracy of the components based method with the full representation method in the same experimental settings.
In the components based method, five facial parts are extracted for recognition using the method stated in this paper. On the contrary, the detected whole face is applied for further processing. From Figure 7, which shows cumulative matching score (CMS) curve of the two methods, we can see that the proposed method gains significant recognition accuracy improvement over the full representation method. This is due to the fact that by integrating the global shape constraint, the local facial components detection becomes more tolerant with the variations of lighting, pose angle, and facial expression.

IK-dT versus DK-dT.
In the second experiment, IK-dT and DK-dT, the two different ways for parallelizing K-d trees, are compared in the aspect of energy efficiency. In addition to the FERET dataset, another standard database, Labeled Faces in the Wild-a (LFW-a) [31], in which 1,680 of the people pictured have two or more distinct photos, is used to compare the performance of these two methods. For the images in LFW-a, we rescale and crop the images to the same resolution with those in FERET. The related energy parameter elec is set as 50 nJ/bit, and fs , mp are 100 pJ/bit/m 2 . Figure 8 demonstrates the advantage of DK-dT over IK-dT in the energy consumption, both on the FERET and LFWa databases. Instead of transmitting the facial feature to all the leaf nodes in IK-dT, only a small portion of leaf nodes will be the destination of the detected feature. As the scale of the enrollment database grows, the superiority of DK-dT will become more obvious.

Comparisons with Other
Methods. In this part, due to the unavailability of the system source code, we implement Razzak et al. 's [12], Muraleedharan et al. 's [13], and Yan and Osadciw's method [14] ourselves and further compare them with our proposed scheme under the topology of DK-dT. To compare with these reported methods comprehensively, we evaluate the rank-1 recognition rate and energy efficiency adopting the energy model used in our work. The experiment settings are the same as stated above, and 500 subjects in FERET dataset are selected for recognition in the simulation. 10 probe images are fed into the whole system for recognition every minute, which takes 50 minutes to cover all the 500 images in the database. We run this experiment 10 times, and the average results are shown in Table 1. From Table 1 we can see that the recognition accuracy of our proposed method is comparable to other methods, while performing better in the aspect of energy efficiency.

Discussion and Conclusion
In this work, a face recognition system prototype applied in WSNs is proposed. Given the resource constraints of sensor nodes, a distributed mechanism based on K-d tree is applied in the process of both face image transmission and retrieval. In addition, two different ways of parallelizing K-d trees are compared. To further guarantee the recognition accuracy in the wireless transmission, a robust facial components extraction method is implemented. According to the experimental results, the proposed method is very promising to build a lot of useful applications in WSNs. For example, users can establish secure temporary communication network based on face identification. Our proposed method is also very suitable to apply to the remote wireless surveillance system, for which the energy efficiency is very critical.
Some limitations of this work still exist. As for K-d tree structure in our proposed method, the main advantages lie in the load of the face recognition task. To validate this, a larger scale test is needed to demonstrate the superiority of our proposed framework to other methods. Moreover, whether the whole system can continuously provide reliable data gathering, data transmission, facial feature extraction, and recognition is a critical issue. In our work, the ID-Kd tree structure is mainly targeted at balancing the computational load among nodes all over the whole network, which has not introduced data redundancy to the whole system. For future work, besides load sharing, the storage sharing or retransmission mechanism will be introduced in order to improve the reliability of the system. In addition, the proposed prototype will be adapted according to specific requirements of application scenarios stated above to further improve the performance of the whole system.