Face Recognition in the Scrambled Domain via Salience-Aware Ensembles of Many Kernels

With the rapid development of Internet-of-Things (IoT), face scrambling has been proposed for privacy protection during IoT-targeted image/video distribution. Consequently, in these IoT applications, biometric verification needs to be carried out in the scrambled domain, presenting significant challenges in face recognition. Since face models become chaotic signals after scrambling/encryption, a typical solution is to utilize the traditional data-driven face recognition algorithms. While chaotic pattern recognition is still a challenging task, in this paper, we propose a new ensemble approach-many-kernel random discriminant analysis (MK-RDA)-to discover discriminative patterns from the chaotic signals. We also incorporate a salience-aware strategy into the proposed ensemble method to handle the chaotic facial patterns in the scrambled domain, where the random selections of features are made on semantic components via salience modeling. In our experiments, the proposed MK-RDA was tested rigorously on three human face data sets: the ORL face data set, the PIE face data set, and the PUBFIG wild face data set. The experimental results successfully demonstrate that the proposed scheme can effectively handle the chaotic signals and significantly improve the recognition accuracy, making our method a promising candidate for secure biometric verification in the emerging IoT applications.

privacy legally during video distribution over the public internet. By scrambling faces detected in private videos, the privacy of subjects can be respected, as shown in Fig.1.
Compared with full encryption methods, face scrambling is a compromise choice because it does not really hide information, since unscrambling is usually achievable by simple manual tries even though we do not know all the parameters. It avoids exposing individual biometric faces without really hiding anything from surveillance video. As shown in Refs.
[5~14], scrambling has recently become popular in the research field of visual surveillance, where privacy protection is needed as well as public security. Another advantage of face scrambling over encryption is its computing efficiency, and usually it is far simpler than complicated encryption algorithms. In many business cases such as public surveillance, the purpose is limited to only privacy protection from unintentional browsing of user data. Hence, full encryption becomes unnecessary in this context.
There are many ways to perform face scrambling. For example, scrambling can be done simply by masking or cartooning [8]. However, this kind of scrambling will simply lose the facial information, and hence subsequent face recognition or verification becomes unsuccessful in this case. Especially for security reasons, it is obviously not a good choice to really erase human faces from surveillance videos. In  . Semantic approaches such as using AAM [18]~ [25] for facial emotion estimation cannot be applied in the scrambled domain.
2 comparison, the Arnold transform [13,14], as a basic step in many encryption algorithms, is a kind of recoverable scrambling method. Scrambled faces can be unscrambled by several manual tries. Hence, in this work, we have chosen Arnold transform based scrambling as our specific test platform. Face recognition has been extensively researched in the past decade and significant progress has been seen towards better recognition accuracy in recent reports [15~21]. These approaches usually exploit semantic face models [22~23] where a face is considered as an integration of semantic components (such as eyes, nose and mouth), and hence semantic related sparse features or local binary patterns (LBP) can be effectively used to improve the recognition accuracy. Beyond 2D facial modelling, 3D models [23] can also be exploited for better accuracy by taking advantage of 3D face alignment.
However, as shown in Fig.2, a scrambled face has a very different appearance from its original facial image. While we can easily match a 3D model to a normal facial image, it becomes extremely hard to do so after the face has been scrambled. In the scrambled domain, semantic facial components simply become chaotic patterns. In this context, it becomes difficult to exploit landmarks or 3D models for better accuracy. As shown in Fig.2, while face models can be easily fitted with a facial image, it becomes impossible after a face is scrambled into chaotic patterns. As has been discussed in [15], one straightforward way is to use traditional data-driven approaches, where chaotic signals are treated simply as a set of data points spread over manifolds.
Various data-driven face recognition algorithms have been developed over several decades. In the early days, linear dimensionality reduction [24~27] was used for this challenge, such as principal component analysis (PCA) [24], independent component analysis (ICA) [24], and Fisher's linear discriminant analysis (FLD) [25]. With kernel methods (KM) [26], these methods can be extended to a reproducing kernel Hilbert space with a non-linear mapping, and extended as k-PCA and k-FLD. Recent progress on nonlinear manifold learning [27~32] has produced a number of new methods for face recognition, such as Laplacianface [30] and Tensor subspace [31]. These approaches have been successfully used for data-driven face recognition. However, for face recognition in the scrambled domain, we need a robust approach to handle chaotic signals in the scrambled domain, which appear random and beyond human perception.
In recent research, multi-kernelization [32,33] has been proposed to handle the complexity of data structure, where it is believed multiple-view discriminative structures [34,35] need to be discovered where a manifold may have different geometric shapes in different views. With the hope of utilizing this approach for chaotic signals, in this paper we propose a new approach called Many Kernel Random Discriminant Analysis (MK-RDA) to handle this new challenge of chaotic signal recognition in the scrambled domain. We also propose a mechanism to incorporate a salience model [36] into MK-RDA for pattern discovery from chaotic facial signals, since it is believed that semantic features are usually salient and useful for facial pattern classification.
In the following sections, facial image scrambling using the Arnold transform is introduced in section II, and the semantic mapping of facial components for robust feature extraction in the scrambled domain is described. In section III, we introduce the background and motivation of our "many kernel" ensemble method, and present our many-kernel random discriminant analysis. In Section IV, we present the framework using MK-RDA with the salience model for chaotic facial pattern verification. Section V gives the experimental results on three face datasets, and conclusions are drawn in Section VI.

A. Face Scrambling
In many IoT applications, it is not encouraged to hide any information by encryption; on the other hand, it is legally required to protect privacy during distribution and browsing. As a result, scrambling becomes a compromise choice because it doesn't really hide information (unscrambling is usually achievable by simple manual attempts), but it does avoid exposing individual faces during transmission over the internet. Additionally, scrambling usually has much lower computation cost than encryption, making it suitable for simple network-targeted applications using low power sensors.
Among various image scrambling methods, the Arnold scrambling algorithm has the feature of simplicity and periodicity. The Arnold transform [11,12] was proposed by V. I. Arnold in the research of ergodic theory; it is also called cat-mapping before it is applied to digital images. It has been widely used in visual surveillance systems where it is favored as a simple and efficient scrambling method which nevertheless retains some spatial coherence. In this paper, we use this scrambling method to set up the test environment of our algorithm in the scrambled face domain.
which is called two-dimensional Arnold scrambling. Here, x and y are the coordinates of the original pixel; N is the height or width of the square image processed; x' and y' are the coordinates of the scrambled pixel. The Arnold transform can be applied iteratively as follows: Here, the input is the original image after the k-th Arnold transform, and Pxy k+1 on the left is the output of the k+1th Arnold transform. k represents the number of iterations, where k = 0, 1, 2 and so on.
By the replacement of the discrete lattice for transplantation, the Arnold transform produces a new image after all pixels of the original image have been traversed. In addition, Arnold scrambling also has the property of being cyclic and reversible. Fig.3-a) shows a face with its facial components (i.e., eyes, nose and mouth) circled by different colors. Fig.3-b) shows the scrambled face after one operation of the Arnold transform, where it can be seen that facial components have drastic displacements. Fig.3-c) and d) shows the scrambled faces after two and three operations of the Arnold transform. In comparison with Fig.3-b), the scrambled faces in Fig.3-c) and d) are more difficult to identify by the human eye. In this work, we use three operations of the Arnold transform to scramble all faces.
As we can see from Fig.3, before scrambling, facial components can easily be identified by the human eye. After scrambling, the images become chaotic signals, and it is hard to figure out eyes and noses. Since semantic facial components are considered important cues for face recognition, we need to find a way to incorporate semantic approaches into the scrambled domain to attain higher matching accuracy.
In many IoT based applications, it may not be allowed to unscramble detected faces due to privacy-protection policies. Moreover, unscrambling may involve parameters (such as the initial shift coordinates) that are usually unknown by the online software. Facial recognition in the scrambled domain then becomes a necessity in these IoT applications.

B. Semantic Facial Components
Fundamentally a 2-D face image is the projection of a real 3-D face manifold. This viewpoint leads to model-based face recognition, where semantic facial components (such as eyes, nose, and lips) are modeled by their parameters. A very frequently applied face model is the active appearance model (AAM) [20]~ [23]. 3D facial information is better for describing the semantic facial components in the presence of illumination and pose changes, where 2-D descriptors sometimes turn out to be less effective. Hsu and Jain [23] have advocated that such semantic facial components constitute the meaning of a face and decisively form the basis of face recognition.
Along this roadmap, template-based face description [21] has been considered to emphasize the importance of semantic facial components. In our human perception system, concept-level semantic features are more meaningful than pixel-level details. A good emotion estimation model usually relies on the importance of semantic features. Changes in a single pixel or sparse set of pixels should not distort the final decision.
Though semantic approaches have attained great success in facial analysis, they need a robust scheme to map a 2D image into its semantic feature space or 3D deformable model. This computation is not trivial and usually cannot be afforded by many real-world applications such as mobile computing platforms. Besides, the detection of semantic features can be sensitive to different conditions, and hence produces extra errors in face classification. To take advantage of semantic features without worrying about its computing complexity, in this paper we introduce a salience-aware method into our facial analysis.

C. Semantic Salience Mapping of Facial Images
Since semantic components are important cues to identify a specific face, we need to find a way to introduce these factors in statistic face modelling. In this paper, we propose to use salience learning for semantic facial mapping, and incorporate the learned semantic map into a random forest method for face recognition.
As shown in Fig.4-a), facial components are usually salient features in a facial image. In this paper, we employ the Deep Salience model [39] for sematic feature mapping. Unlike other models based on color salience using pixel contrast, this deep salience model bases its algorithm on structural salience, and hence can easily find the semantic components as its salient features, as shown in Fig.4-a). This fits well with our purpose to exploit semantic components in a facial image.
We then apply a Gaussian mixture model to summarize the learned salience maps of the training dataset, where the salience distribution is considered as a mixture of Gaussian functions, is the normalized Gaussian distribution with mean µi and variance σi. In our work, we use a two-class GMM model and estimate the probability of a pixel being salient or non-salient. Learning with GMM mixtures can find optimized Gaussian distribution parameters in the GMM model, and consequently produce a distribution map S=p(x|λ) from Eq. (2), which is referred to as the semantic importance map in this paper. Fig.4-b) shows the estimated semantic importance map learned from Fig.4-a), which highlights semantic features such as eyes, nose and mouth. This importance map represents the importance of each feature subspace in terms of its relation to semantic features. Fig.4-c) shows the scrambled semantic map. Once we have the semantic salience map of the training dataset, we can then use it to guide the feature sampling to favor semantic features.

A. Background on Multi-Kernel Approaches
In many real world applications such as face recognition and image classification, the data often has very high dimensionality. Procedures that are computationally or analytically manageable in low-dimensional spaces can become completely impractical in a space having several thousand dimensions. This has been well known in machine learning as a notorious issue ---the "Curse of Dimensionality" [1~3]. To tackle this challenge, various techniques [1~12] have been developed for reducing the dimensionality of the feature space, in the hope of obtaining a more manageable problem. Dimensionality reduction has become an especially important step for face classification.
Various algorithms have been developed for image-based face recognition. In this paradigm, dimensionality reduction [19] has always been a primary concern. As mentioned previously, methods developed for this challenge include principal component analysis (PCA) [24], independent component analysis (ICA) [24], and Fisher's linear discriminant analysis (FLD) [25]. With kernel methods (KM) [26], these methods can be extended to a reproducing kernel Hilbert space with a non-linear mapping, and extended as k-PCA, k-ICA and k-FLD. Recent progress on nonlinear manifold learning [27]~ [31] has led to a number of new methods for face recognition, such as Laplacianface [35], Tensor subspace [36], non-negative matrix [37], and local Fisher discriminant analysis (LFDA) [38,22]. These approaches usually assume there is an underlying discriminative structure to discover, which leads to the paradigm of manifold learning.
Recently, the multi-view problem has been noticed by the research community, where the same manifold can have different shapes in different subspaces, as shown in Fig.5-a).
Foster et al. have employed canonical correlation analysis (CCA) [32] to derive the low dimensional embedding of two-view data and to compute the regression function based on the embedding. Hedge et al [33] propose a multiple projection approach from the same manifold. Hou et al [34] used the pairwise constraints to derive embedding in multiple views with linear transformation. Xia et al [35] combined spectral embedding with the multi-view issue. Han et al. [36] proposed a sparse unsupervised dimensionality reduction to obtain a sparse representation for multi-view data. Lin et al [37] proposed multiple kernel learning of a manifold, where various kernel spaces are constructed with different sets of parameters. Zien et al [38] considered multiple kernels with regards to multi-class cases.
In the multi-view problem, as shown in Fig.5-a), although a manifold has different forms in different subspaces, these forms can always be unified as the same manifold in a higherdimensional subspace. However, this may not always be true. As shown in Fig.5-b), when the sequence of data points in the second subspace is shuffled, the combination of two submanifolds simply creates a noisy-like distribution. This means two submanifolds cannot be merged at all. In this case we have to treat it as a multiple or even "many manifold" problem, where multiple manifold structures need to be discovered.
In our facial recognition in the scrambled domain, facial images become chaotic signals, as shown in Fig.1 and Fig.2. In this real-world case, its underlying discriminative structures could be more like the case in Fig.5-b), where multiple manifold structures need to be discovered. In this paper, we include this case in our consideration and propose a new many-kernel approach to handle its complexity. Before we go further, we give an introduction to kernel based analysis.

B. Preliminary on Kernel based Discriminant Analysis (KDA)
For a set of data points {xi}R N , we may select a set of data points as the landmarks {Lj} that can characterize this dataset. A data point on the manifold then can be located by its kernel distance to the landmarks:  (5) where Φ is the projection matrix, and SB is the between-class covariance matrix: and Sw is the within-class covariance matrix: By optimizing over Eq. (7), we then have the Eigen projection matrix W, and each data point is then represented by its new coordinates in the KDA space: Here, Φ is an Eigen matrix R D×M , yiR D , and D is usually a number smaller than M as well as smaller than the number of classes in the training dataset {xi}.

C. Many Kernels for the Many Manifold Problem
Though it has been assumed in many methods that there is only one underlying manifold structure, it is obvious that there can often be multiple manifolds underlying many real-world datasets, as shown in Fig.5-b). However, the discovery of the underlying manifold structures is an inverse engineering problem that could be very complex, and often intractable. that can be made, and within each selection an independent sub manifold may be discovered. For example, when N=10 and M=5, K will be 252. For a facial image, there could be 64×64=4096 dimensions, and M could be any number. Hence, the estimation of possible subspaces becomes an NP-hard problem that cannot be handled exhaustively in realistic computing time. Hence, the discovery of "many manifolds" becomes a major challenge that has not yet been fully appreciated.
In this work, to address the challenge shown in Fig.5-b), we propose a randomization strategy to generate "many kernels" and try to cover as many manifolds as possible in a given dataset by chance, which reduces the complexity of the "many manifolds" problem from its exponential computing time to something manageable.

D. Many Kernels from Random Feature Selection
If we have K data points {xi}, then typically the random selection of subspaces can be easily attained by generating a list of random numbers lk, and selecting KL features to construct the new datasets: ) ( k j k j l x z (9) Here, {zj}R KL . Then we can construct a kernel space based on this randomly selected subspace: We can repetitively redo the above randomization process, and as a result, we can easily construct as many kernels as we want.
If we have LK kernels and each kernel has KL dimensions, then for each data point xi, we will have the kernel representation {κi k } actually as an LK×KL matrix. To guarantee the kernelized dimensions are not too much more than the original data dimensions, we add a constraint: which means the "many kernel" process will not increase or decrease the dimensions. This process is outlined in List I.

E. Many-Kernel Random Discriminant Analysis
The purpose of this many-kernel strategy is to find the underlying discriminative structures in each subspace. After we obtain the many kernel based representation κi k , we can then apply discriminant analysis over each kernel subspace and find  Fig.4-c).
b) The corresponded pixels on the original facial image.
c) Actual hit rates in scrambled domain. d) Unscramble the hit map back to facial domain. Fig.6. Selecting kernel subspaces toward semantic features. its discriminative projection.
For a set of training data and its kernel representation {κi k }, we can calculate its within-class covariance at its k-th kernel subspace as:   (12) and its between-class covariance matrix: To find the most discriminative features, we can maximize its between-class covariance over its within-class one by finding a projection matrix Φ k : (14) By optimizing over Eq.(10), we then have the Eigen projection matrix Φ k B D×KL . For each data point κi k , we can then have its discriminant projection in its k-th subspace: For each kernel subspace, we can obtain the kernel discriminant projection for each data point. As a result, we will have the LK projection:   k i y Y~ (16) where Y will be a matrix B D×LK .

A. Salience-Biased Feature Space Reconstruction
Unsurprisingly, salient features usually play an important role in face classification. Therefore, rationally we can expect a mechanism to give salient features more weight than others. In this work, we consider a biased strategy to reconstruct the feature space to favor semantic salient features.
Considering a scrambled facial image x as a vector of facial features/signals {f1, f2…, fk, …}, and a semantic salience map S~{s1, s2…, sk, …} learned from training (as shown in Fig.4-c), we can then construct a new feature space by replicating each feature according to its semantic importance. Assuming the maximum multiplicative factor as Ks, the repetition of each feature is then defined as: (17) Here, ki means how many times the i-th feature/signal will be repeated, and si is the salience value of the i-th signal shown in Fig.4-c). Consequently, we have a new set of features: With the above multiplicative process, salient features will have a higher likelihood to be chosen in the randomized selection process in Eq. (9).
We then can apply the random selection to select subspaces from the reconstructed feature space χ new to form the "many kernels" for MK-RDA. Fig.6 shows the results of such a salience-guided selection using the scrambled salience map in Fig.4-c). We can see that with the salience guiding, semantic facial features will be more likely to be used to form our kernels subspaces.

B. Salience-Aware MK-RDA
After the feature space is reconstructed, we can apply MK-RDA on the reconstructed datasets { χ i} instead of {xi}, and we have: At the end, we will have Y as a matrix B D×LK .
For any two data points x1 and x2, their distance in the projected subspaces can be calculated as: Here ||·|| denotes the Euclidean norm. For data classification, the likelihood of a data point belonging to a class c can be estimated from its distances to all training data points in the k-th learned kernel subspace: Here, P(c|Φ k ) denotes the estimated probability in the k-th kernel projection Φ k that an input data point x belongs to a class c (c = 1, 2,..., nc). For all kernels, the discriminant function is defined as: and the decision rule is to assign x to class c for which c(x) is the maximum. Fig.7 gives an overview of the proposed salience-aware scheme for scrambled face verification. Given a training dataset, faces are forwarded to the training procedure. The offline procedure then learns its semantic salience map. Following this, the database is scrambled and the feature space is reconstructed by multiplying salient features according to their semantic salience weights. Random sampling is then applied to select features sparsely to construct as many kernels as is allowed, and discriminant analysis is used to learn a kernel subspace for each kernel.

C. Overview of the Salience-Aware Scheme
After a scrambled facial image is input as a test, the input is Fig.7 Overview of the proposed salience-aware scheme projected into each kernel subspace, and the distance to each training sample is computed. The decision procedure is based on the combination of all kernel subspaces via Eq. (22).
It is noted that we can have unscrambled images (mainly for statistic salience learning) in the offline training because offline training is carried out centrally with authorities'/business supervisors' permission and will not undermine users' privacy. Privacy protection is mainly an issue with distribution over the internet.
In this scheme, the training procedure can be carried out offline. The online verification then becomes purely a data-driven process. In the test procedure, all test images and semantic maps are scrambled for privacy protection, and no original face will be utilized for recognition purposes. Hence, it is similar to other data-driven approaches, and is simple and straightforward.

D. Discussion of Salience-Aware MK-RDA
Before we proceed to our benchmark experiments, there are two questions that need to be answered. First, in the MK-RDA mechanism, what is the best LK to choose? Namely, how many kernels are enough? Second, in the above salience-aware mechanism, can such a salience biased mechanism really help attain better accuracy in face recognition? Here, we design an experiment to find out the answers to these two questions.
For this experiment, we chose the Yale face dataset [40] for our tests. In the Yale dataset, each of the 15 subjects has 11 sample faces with different expression, illumination and glasses configuration. We only choose 6 faces with different expressions for our test, as shown in Fig.8-a). With this small dataset, we carried out the face recognition tests by splitting the small dataset into training and test datasets, where the training dataset has five subjects and test dataset has the rest. We then varied LK, the number of kernels, and Ks, the max weight of salience map, in our experiments. We then examined which set of parameters gives the best error rates. Fig.8 shows the results of our experiment. Fig.8-b) gives the experiment results on the number of kernels. Given Ks as 1.5, the number of kernels varied from 5 to 60. We can see that the error rate is lowest when LK is around 32. Basically, more kernels mean more computing time. As long as we have a low error rate, using fewer kernels is often preferable. It is also observed that compared with the baseline kLDA, MK-RDA has attained marginally better accuracy.
We then ran an experiment on Ks. As shown in Eq. (17), Ks=0 means no bias. The bigger Ks is, the more biased it is toward the salient features. Fig.8-c) shows the experimental results. It can be seen that the error rate is lowest when Ks is around 2.5. It is also observed that biased sampling with higher Ks simply worsens the accuracy because it means some non-salient features may be abandoned in the random process even though they may contribute to the recognition process.

V. EXPERIMENTS
To validate our algorithm, we implemented our face recognition method in Matlab, and ran on a PC with 2.5GHz dual-core Intel CPU. Before running the benchmark on face datasets, all images in the datasets were scrambled using the (triple) Arnold transform [7~8]. Fig.11 shows selected face images from the three datasets: ORL, PIE and PUBFIG.
The ORL database has 40 subjects, each with 10 faces at different poses. In total, 400 faces are used for the test. The CMU PIE database [40] has 41,368 faces, comprising 67 classes with about 170 faces per class, including a wide spectrum of variations in terms of pose, illumination, expression and appearance. In our tests, we use 50 faces per subject, similar to [30] and [31].
The PUBFIG database [42] contains wild faces selected from the internet. It is very similar to LFW [43] but it provides standard cropped faces. As has been shown [43], background textures in LFW can help attain a higher accuracy. Since we consider face recognition only, PUBFIG fits better with our purpose.
In many previous reports [9], the leave-one-out test scheme  For a leaving k out scheme, there are usually CN k choices. In our experiment, we just chose 3 sets of consecutive faces from N samples, starting at N/4, N/2 and 3N/4. As a result, we have 3 sets of tests in turn for a leave-k-out experiment. The final accuracy is given by the average of all three tests. It is noted that the consecutive splitting will usually bring out the large difference between test and training datasets, because the datasets have faces varied consecutively and the first k faces are usually very different from the last (N-k) faces.
Our benchmark tests aim to verify whether or not the proposed MK-RDA can enhance the accuracy on scrambled face recognition. Our approach is a pure data-driven face classification method. Hence, similar to Ref. [15], we compared our approach with a number of typical data-driven methods, including Eigenface [25], Fisherface [25], kPCA [26], kLDA [26], and Laplacianface (LPP) [31], each applied to facial images in the scrambled domain. In the evaluation of the proposed scheme, we simply use the nearest neighbor classifier because any involvement of any other methods may blur the comparison and we then cannot easily assert if the enhancement comes from our MK-RDA scheme or any other underlying more complicated classifiers.

A. Tests on the ORL Dataset
The ORL database has 10 faces per subject. In our leave-k-out test, k varies from 1 to 6. In total, each k-test has 3 subtests, with different selections of query faces from 10 faces. The final accuracy is the average on all subtests. Fig.10-a) shows all leave-k-out tests, where k varies from 1 to 6. We can see that the proposed MK-RDA attained the best accuracy in all five k tests. Fig.10-b) lists out the overall accuracy by averaging all k tests. Here, we included PCA, LDA, kPCA, kLDA and LPP for comparison because they are typical data-driven face recognition methods based on dimensionality reduction. We can see that our MK-RDA attained the best accuracy over all k-tests of around 95.7%. In comparison, LPP attained 91.5%, kLDA 93.3%, LDA 93.6%, and kPCA and PCA attained87.5%.

B. Tests on the PIE Dataset
In our experiment, we used 50 faces per subject and in total 3350 faces were used in our leave-k-out experiment. In this test scheme, k faces from N samples per subject are selected as test samples, and the rest are used as training samples. Fig.11 gives the test results on the PIE dataset. Fig.11-a) shows all leave-k-out tests, where k varies from 5 to 25. We can see that the proposed MK-RDA attained the best accuracy in all k tests. However, when k is increased, fewer samples are left for training and as a result the accuracy drops in all methods.  Fig.11. Leave-k-out tests on PIE dataset. LDA attained 80.0%, kLDA got a better score of 81.5%, and LPP has the second best accuracy of 83.1%. In comparison, our MK-RDA attained the best accuracy of 91.5, clearly better than the other data-driven approaches.

C. Tests on PUBFIG Dataset
The PUBFIG dataset is designed to compare various algorithms against the human vision system. Its typical benchmark test can have as many as 20,000 pairs of faces for comparison. However, in IoT-targeted scrambled domain, human perception can barely recognize any scrambled faces, making it meaningless to carry out this human-compared test. On the other hand, in the scenarios of IoT applications, usually we have training datasets on the server side, making it most likely as a leave-k-out experiment. For this reason, we need to design a new evaluation scheme.
In our experiment, we selected 52 subjects with 60 faces each, and split it randomly into test and training datasets, with each having 30×52=1560 faces. We then test all data-driven methods by comparing each test face against all training faces. In total, we have 1560×1560=2.4 million pairs for testing. Here we use two criteria to evaluate our experiment. One is the rank-1 accuracy versus dimensionality. The other is the true positive (TP) versus the false positive (FP). Fig.12-a) shows the accuracy versus dimensionality. It is shown that the proposed MK-RDA attained marginally better accuracy-dimensionality performance, consistently corroborating the underlying conjecture that the proposed many kernels method may help capture the intrinsic multiple manifolds lying under the given dataset, as discussed in Section III. Fig.12-b) gives the results on TP-FP curves. Here, we obtained a likelihood matrix of 1560×1560 elements by comparing each test sample against all training samples. Then we applied varying thresholds on the likelihood matrix, and counted how many pairs classified as positive are false positive and true positive pairs. From the results shown in Fig.12-b), it is observed that PCA has the worst performance, nearly no different from random guessing. From the comparison, we can clearly see that the proposed MK-RDA has clearly better performance on the true/false positive tests, with consistently better true/positive rates (TPR) over other data-driven face recognition methods.

VI. CONCLUSION
In conclusion, we have identified a new challenge in scrambled face recognition originated from the need for biometric verification in emerging IoT applications, and developed a salience-aware face recognition scheme that can work with chaotic patterns in the scrambled domain. In our method, we conjectured that scrambled facial recognition could generate a new problem in which "many manifolds" need to be discovered for discriminating these chaotic signals, and we proposed a new ensemble approach -Many-Kernel Random Discriminant Analysis (MK-RDA) for scrambled face recognition. We also incorporated a salience-aware strategy into the proposed ensemble method to handle chaotic facial patterns in the scrambled domain, where random selection of features is biased towards semantic components via salience modelling. In our experiments, the proposed MK-RDA was tested rigorously on three standard human face datasets. The experimental results successfully validated that the proposed scheme can effectively handle chaotic signals and drastically improve the recognition accuracy, making our method a promising candidate for emerging IoT applications.