Feature Fusion Methods for Indexing and Retrieval of Biometric Data: Application to Face Recognition with Privacy Protection

Computationally efficient, accurate, and privacy-preserving data storage and retrieval are among the key challenges faced by practical deployments of biometric identification systems worldwide. In this work, a method of protected indexing of biometric data is presented. By utilising feature-level fusion of intelligently paired templates, a multi-stage search structure is created. During retrieval, the list of potential candidate identities is successively pre-filtered, thereby reducing the number of template comparisons necessary for a biometric identification transaction. Protection of the biometric probe templates, as well as the stored reference templates and the created index is carried out using homomorphic encryption. The proposed method is extensively evaluated in closed-set and open-set identification scenarios on publicly available databases using two state-of-the-art open-source face recognition systems. With respect to a typical baseline algorithm utilising an exhaustive search-based retrieval algorithm, the proposed method enables a reduction of the computational workload associated with a biometric identification transaction by 90%, while simultaneously suffering no degradation of the biometric performance. Furthermore, by facilitating a seamless integration of template protection with open-source homomorphic encryption libraries, the proposed method guarantees unlinkability, irreversibility, and renewability of the protected biometric data.


I. INTRODUCTION
Personal, commercial, and governmental identity management systems increasingly rely on biometric technologies, which enable reliable recognition of individuals based on highly distinctive characteristics of human beings, e.g. face or fingerprints. Applications ranging from personal device access [1], border control [2]- [4], forensic investigations and law enforcement [5]- [7], national ID systems [8], [9], and voter registration [10], [11] benefit from the use of biometrics. The largest systems of this kind enrol hundreds of millions or even beyond a billion enrolled subjects (see e.g. [12]), with the global market value of biometric technologies currently estimated to be tens of billions of dollars [13].
As the prevalence, size, and scope of the operational biometric systems increase, the development of technologies which are capable of accurately and efficiently processing biometric data becomes critically important. In the challenging identification and duplicate enrolment check scenarios, where typically an exhaustive search (i.e. one-to-many comparison) is needed, solutions which facilitate practical system response times are indispensable. Rather than merely scaling the hardware architecture, which is associated with high monetary costs, algorithmic methods (such as indexing) referred to as biometric workload reduction [14] can be used to speed-up the search queries (and hence reduce the monetary costs). In recent years, strong interest from governmental side in such methods has been manifested through numerous benchmarks and competitions [15]- [17].
In addition to the aforementioned practical requirements pertaining to biometric performance and computational efficiency, preventing misuse (e.g. privacy violations) of the stored biometric reference data is essential. Existing privacy regulations, e.g. the General Data Protection Regulation (GDPR) [18], classify biometric data under "special categories of personal data", thus entailing significant responsibilities for the data controllers. Traditional encryption methods are unsuitable for protecting biometric data, since biometric characteristics exhibit a natural intra-class variance. If traditional cryptographic techniques are applied to biometric templates, said biometric variance prevents a biometric comparison in the encrypted domain. That is, the use of conventional cryptographic methods would require a decryption of protected biometric data prior to the comparison. In contrast, biometric template protection [19]- [21] enables a comparison of biometric data in the encrypted domain and hence a permanent protection of biometric data. Biometric template protection schemes use auxiliary data to obtain pseudonymous identifiers from unprotected biometric data. Biometric comparisons are then performed via pseudonymous identifiers while unprotected biometric reference data is discarded [22]. Biometric template protection schemes have hardly been employed in biometric identification systems [14]. One reason for this is that many types of biometric template protection schemes require complex comparison methods which renders them unsuitable for biometric identification (where the workload is dominated by comparison costs). So far, only a handful approaches have combined computational workload reduction strategies with biometric template protection. In the context of face biometrics, those studies have mainly employed cancelable biometrics, e.g. [23]- [25]. However, most of those systems still report a degradation w.r.t. biometric performance when benchmarked against unprotected systems. Practically feasible applications of homomorphic encryption in biometric identification systems have likewise been presented [26]- [29]; arXiv:2107.12675v1 [cs.CV] 27 Jul 2021 however, while suffering little to none biometric performance degradation, these schemes also have relied on exhaustive search in a biometric identification scenario and have not considered integration of computational workload reduction such as biometric indexing.

A. Contribution and Organisation
The main contributions of this article are as follows: • A comprehensive overview and literature survey of works pertaining to (and especially those combining) the areas of information fusion, computational efficiency, and template protection in biometric identification systems. • A proposal of a multi-stage protected indexing and retrieval system for facial biometric identification based on optimised information fusion and incorporating data privacy-preservation with homomorphic encryption. • A thorough theoretical analysis and empirical evaluation of the proposed system on a large dataset with state-of-the-art facial recognition systems. Using ISO/IEC IS 19795-1 [30] compliant experimental protocol and metrics, the proposed system is shown to reduce the computational workload of a biometric identification retrieval by approximately 90%, while simultaneously maintaining the baseline biometric performance. Additionally, the possibility of seamless integration of postquantum-secure homomorphic encryption means that the data security and privacy objectives specified in ISO/IEC IS 24745 [22] are ensured. The remainder of this article is organised as follows: section II provides relevant background information and an overview of related works. The proposed system is described in section III. The experimental setup and the obtained results are presented in sections IV and V, respectively. Section VI contains concluding remarks and a summary.

II. BACKGROUND AND RELATED WORK
The system proposed in this article combines three research areas within biometrics, i.e. information fusion (subsection II-A), computational workload reduction (subsection II-B), and template protection (subsection II-C). This section provides a brief overview of the relevant background information and key related works in those areas.

A. Information Fusion
Information fusion can be used in order to improve the discriminative power of a biometric recognition system. Referred to as "multi-biometric systems", they take advantage of multiple information sources which are combined (fused) in some way. Following fusion categories can be generally distinguished in the context of biometrics [31], [32]: Multi-type where multiple biometric characteristics (such as facial images and fingerprint scans) are used. Multi-sensorial where the biometric data acquisition is conducted with diverse sensors providing complementary information (for example, near-infrared and visiblewavelength cameras).
Multi-algorithm where the biometric data is processed utilising several complementary algorithms (for instance, image descriptors based on texture and keypoint information). Multi-instance where more than one instances of the same underlying type of biometric characteristic are used (e.g. the images of right and left iris). Multi-sample where several biometric samples stemming from one type of biometric characteristic are used (e.g. multiple acquisitions of a fingerprint scan with the purpose of detecting reliable regions or assuring the quality and consistency of the acquired data). Information fusion can occur at different steps of the biometric processing pipeline [31], [32], including: Sensor where raw data (e.g. images acquired by different sensors or multiple samples) is combined before other processing steps [33], [34]. Feature where the extracted feature sets (e.g. from multiple samples) are consolidated [35], [36]. Score where the comparison scores computed through different information channels are combined (e.g. averaged) [37], [38]. Rank where the orders (ranks) of potential matches between a probe and the enrolment database obtained through different information channels are consolidated [39], [40]. Decision where the decisions (i.e. acceptance or rejection) obtained through multiple information channels are combined (e.g. by a majority vote) [41], [42]. In the context of this work, fusion of multiple samples (from different data subjects) on feature level are of most interest, as the system proposed in section III is designed to operate at those level of the biometric processing pipeline.
The topic of information fusion in biometrics has been addressed extensively in the scientific literature. In [31], a general introduction to this topic is given, while [32], [43], [44] provide recent and comprehensive surveys of this research area.

B. Computational Workload Reduction
Maintaining fast biometric identification system response times often requires optimisation or additional investments as the size of the enrolment database increases. The computational costs of the typical, exhaustive search-based, retrieval method tend to grow linearly with the number of enrolled data subjects [45]. Naturally, the expansion of the underlying hardware (e.g. by using many servers which facilitate distributing the computations) can be used to maintain quick system response times; however, this solution carries with it high monetary costs, such as the purchase of the equipment, its installation and maintenance, etc. While hardware investments are often inevitable, an often overlooked possibility is the optimisation of the underlying software and/or algorithms. In this context, the field of computational workload reduction has emerged in recent years and numerous methods have been proposed which can help to mitigate some of the costs of the physical infrastructure. The goal of such methods is the reduction of the required amount of computations for some specific tasks in the biometric recognition pipeline. As the computational costs of the biometric template comparisons typically dominates the overall computational effort in biometric identification transactions, most of the approaches proposed in the literature are aimed specifically at optimising this step of the biometric identification pipeline [14]. More specifically, two broad classes of approaches can be distinguished: preselection, concentrating on the reduction of the search space, i.e. the number of necessary template comparisons (see e.g. [46]), and feature transformation, aimed at lowering the computational cost of the individual template comparisons (see e.g. [47]). The former are of interest in the context of this article.
Numerous methods rely on the so-called pre-filtering of the enrolment database during a biometric identification transaction. Such methods depend on categorical or weakly discriminative features (e.g. geographic and/or demographic metadata [48] or soft biometrics [49]), whereby the potential search space can be narrowed down quickly prior to considering the actual highly discriminative, but more computationally expensive to compute, biometric features. Conceptually similar two or multi-stage methods operating on weakly discriminative, compact representations (e.g. dimensionally-reduced or binarised) representation of biometric data have also been considered [50]- [52]. Likewise, general concepts of coarseto-fine search, nearest-neighbour search, and clustering based on the feature sets extracted from biometric samples have also been proposed [14], [46].
More complex methods directly utilising the extracted biometric features and aimed at creating an intelligent search structure (e.g. a search tree) have been shown to be capable of significantly reducing the computational workload. In [53], a tree-based indexing and retrieval system for iris data has been proposed. Many successful methods of biometric indexing integrate information fusion; for example, [54], [55] for multiinstance fingerprint and iris data, respectively. Furthermore, generic multi-biometric indexing methods have also been proposed e.g. in [56]- [58].
In [59], a multi-biometric cascade has been proposed with the aim of successively filtering the candidate short-lists based on score-level information. Similar concepts were utilised in [60], [61], where a signal-level fusion (i.e. morphing, see e.g. [62]) of facial images facilitates a computationally efficient and accurate indexing and retrieval for biometric identification. Those methods are most closely related to the indexing and retrieval method presented in this article.
Generally, the methods mentioned in this subsection often require the storage of additional information (e.g. metadata) and/or a kind of a "setup" step (e.g. creation of a search structure) which requires some computational effort, but only needs to be performed infrequently. On the other hand, many of the described methods facilitate the reduction of computational workload associated with biometric identification transactions by several orders of magnitude w.r.t. the typical exhaustivesearch based retrieval method.

C. Biometric Template Protection
Biometric template protection represents an active field of research since more than two decades. Comprehensive surveys on this topic can be found in [19]- [22]. Biometric template protection methods are usually categorised as cancelable biometrics and biometric cryptosystems. Cancelable biometrics employ transforms in signal or feature domain which enable a biometric comparison in the transformed (encrypted) domain [65]. Biometric cryptosystems commonly bind a key to a biometric feature vector resulting in a protected template. Biometric comparison is then performed indirectly by verifying the correctness of a retrieved key [66]. Further, homomorphic encryption can be employed for biometric template protection [67]. Homomorphic encryption makes it possible to compute operations in the encrypted domain which are functionally equivalent to those in the plaintext domain and thus enables the estimation of certain distances between homomorphically encrypted biometric templates. As defined in ISO/IEC IS 24745 [22], biometric template protection schemes shall fulfil the following requirements: Unlinkability the infeasibility of determining if two or more protected templates were derived from the same biometric instance, e.g. face. By fulfilling this property, crossmatching across different databases is prevented. Irreversibility the infeasibility of reconstructing the original biometric data given a protected template and its corresponding auxiliary data. With this property fulfilled, the privacy of the users' data is increased, and additionally the security of the system is increased against presentation and replay attacks. Depending on the used template protection method, guaranteeing this property may rely on sufficiently protecting a certain secret (e.g. private encryption key(s)) from being compromised by an attacker. Renewability the possibility of revoking old protected templates and creating new ones from the same biometric instance and/or sample, e.g. face image. With this property fulfilled, it is possible to revoke and reissue the templates in case the database is compromised, thereby preventing misuse. Performance preservation the requirement of the biometric performance not being significantly impaired by the protection scheme. Table I lists the mentioned types of biometric template protection and their properties w.r.t. the above criteria as well as key derivation and efficient biometric comparison. The majority of approaches on cancelable biometrics and biometric cryptosystems report a performance gap between protected and original (unprotected) systems [21], as opposed to approaches employing homomorphic encryption. Cancelable biometrics usually employ a biometric comparator similar or equal to that of unprotected biometric systems. Therefore, cancelable biometrics are expected to maintain the comparison speed of the unprotected system which makes them also suitable for biometric identification [14]. In contrast, biometric cryptosystems may need more complex comparators. Similarly, homomorphic encryption usually requires higher computational effort. Practical applications of certain template protection methods, e.g. homomorphic encryption, rely on maintaining the secrecy of the private key(s) used to protect the data (see also subsection V-D).
Some research efforts and standardisation activities have been devoted to establishing metrics for evaluating the aforementioned properties of biometric template protection schemes, e.g. in [68]- [71]. Nonetheless, additional specific cryptanalytic methods may be necessary to precisely estimate the security/privacy protection achieved by a particular template protection scheme. Moreover, the result of such an evaluation also depends on the biometric data to which the template protection system is applied. This makes a comparison of published results difficult and sometimes misleading.
In 2001, Ratha et al. [72] proposed the first cancelable face recognition system using image warping to transform biometric data in the image domain. Another popular cancelable transformation of face images based on random convolution kernels was presented in [73]. In contrast to [72], this approach employs a fundamentally reversible distortion of the biometric signal based on some random seed which later coined the term "biometric salting". The majority of published cancelable face recognition schemes applies transformations in the feature domain [65]. Over the past years, numerous feature transformations have been proposed in order to construct facebased cancelable biometrics, e.g. BioHashing [74], BioTokens [75], and Bloom filters [76]. Recently, feature transformations have been specifically designed for deep convolutional neural networks, e.g. random subnetwork selection [77]. Analyses of some popular cancelable face recognition systems have uncovered security gaps, e.g. in [78]- [81], and already led or are expected to lead to (continuous) improvements of such schemes. Regarding biometric cryptosystems, the fuzzy commitment scheme [82] and the fuzzy vault scheme [83] represent widely used cryptographic primitives. Both schemes enable an error-tolerant protection of (biometric) data by binding them with a secret, i.e. key. Binarised face feature vectors have been protected through the fuzzy commitment scheme in various scientific publications, e.g. in [84], [85]. Also some works have employed the fuzzy vault scheme for face template protection, e.g. in [86]- [88]. It is worth mentioning that some template protection approaches combine concepts of cancelable biometrics with those of biometric cryptosystems resulting in hybrid schemes [20].
For a long time, homomorphic encryption has been considered as impractical for biometric template protection due to its computational workload. However, in the last years, homomorphic encryption has been applied effectively to face-· · · · · · Enrolment DB Pairing and feature-level fusion Metadata/ statistics   [26]. Depending on the used homomorphic cryptosystem, different feature type transformations might be required [28].
Relevant works on biometric template protection for facebased identification systems, i.e. one-to-many comparisons, are shown in table II. Some of listed approaches are cancelable biometrics which usually retain the biometric comparator of the corresponding unprotected system. As mentioned earlier, this property makes these approaches well suited to be applied in identification mode. In addition, approaches for face identification with homomorphic encryption have been proposed, e.g. in [27], [29]. These works, use different concepts to maximize the efficiency of homomorphic encryption, including optimisation strategies, e.g. batching or dimensionality reduction. In summary, it is important to note that all published works on biometric template protection for face identification employ an exhaustive search, i.e. these scheme scale linearly w.r.t. to the number of protected reference face templates in the database.
As mentioned in subsection II-C, a large concern in biometric system deployments is the risk of data exposure 1 . Simultaneous efficient indexing and protection of biometric data has been proposed e.g. in [24], [63], [90]- [92]. [27]- [29] explore the use of homomorphic encryption in conjunction with biometric identification and attempt to reduce computational workload by applying a packing strategy to decrease the computation between the ciphertexts or by applying different (more computationally efficient) HE schemes. In summary, coupling biometric template protection with computational workload reduction (i.e. ensuring privacy and computational efficiency in addition to high biometric performance) is an insufficiently addressed topic in biometric research.

III. PROPOSED SYSTEM
The high-level, conceptual overview of the proposed system is demonstrated in figure 1 (indexing) and algorithm 1 (retrieval). The proposed system relies on creation of an efficient tree-like search structure by fusing the reference templates stored in the enrolment database. Let N be the number of subjects in the enrolment database and n i (selected from the set {2 x | x ∈ N + }) be the number of subjects contributing to the fused templates at the i'th level of the tree-like search structure. For instance, in figure 1, the roots of the indexing trees consist of four subjects, i.e. n 1 = 4. On the following levels, this number decreases (n 2 = 2) until nonfused reference templates are considered at the final level (n 3 = 1). Subsections III-B and III-C provide details on how the templates to be fused are paired and what information fusion methods are used. During a biometric identification transaction, the created search structure is traversed whereby the biometric probe is compared against the fused templates in order to successively narrow down the list of potential candidate identities at each level of the search structure. The search structure has log 2 n 1 + 1 levels; let k i represent the fraction of nodes and their corresponding identities selected at the i'th level of the search structure. The key idea here is for k to be relatively small and decreasing at each level of the search structure. Subsection III-A provides more details on this retrieval algorithm, as well as a theoretical analysis of the possible gains in computational efficiency w.r.t. a naïve exhaustive search-based retrieval algorithm. The proposed system also allows for a seamless integration of template protection as described in subsection III-D.

A. Retrieval
Since the fused templates retain sufficient discriminative power, the probes exhibit better comparison scores against their respective correct (mated) fused templates than against the other (non-mated) fused templates. Consequently, it is possible to make a robust pre-selection of a candidate short-list Algorithm 1 Retrieval in the proposed system Input: probe, indexing trees Output: candidates 1: candidates ← roots of indexing trees 2: for i = 1 to log 2 n 1 + 1 do 3: scores ← compare probe with all candidates 4: best scores ← find scores · k i highest scores 5: candidates ← select candidates with best scores 6: end for 7: return candidates to be passed onto the next level of the cascade. In a successive manner, which is conceptually similar to the previous works on multi-modal and signal-level fusion-based cascades of Drozdowski et al. [59], [61], the candidate short-list shrinks at each level, thus resulting in fewer template comparisons being made and hence in computational workload reduction. The computational workload (W ) [30] of the proposed retrieval scenario can be obtained using the following formula: This equation expresses the computational workload of the proposed indexing and retrieval method as a percentage of the workload required in the typical baseline scenario where an exhaustive (1:N ) search is carried out. Figure 2 illustrates the impact of the parameters of the proposed system on its computational workload. The x-axis shows the number of fused templates at the root level of the search tree, i.e. how many templates are fused with each other (the n 1 parameter). The y-axis denotes the fraction of templates pre-selected at the root level (k 1 ) followed by a cascade with logarithmically decreasing pre-selection sizes. Ω(W) denotes the theoretical lower bound, i.e. 1 fused template being preselected at each level of the cascade. The values in the figure are given for an N value which was used in the empirical experiments reported later on in the paper. With an increasing N value (i.e. growth of enrolment database size), the lower bound of computational workload for the proposed system can be expressed as follows: Following three observations can be made: 1) There do exist configurations (based on k 1 and n 1 parameters) which require significantly less computational workload than the baseline. In other words, provided sufficient discriminative power, the proposed system is capable of reducing the computational workload in biometric identification.    3) The n 1 parameter has a large impact on the possible computational workload reduction. Each time n 1 is doubled (i.e. the height of the cascade increases), the workload is approximately halved for all k 1 values. From the above observations, it follows that for a workloadcentric perspective, one would prefer as high n 1 value as possible in order to achieve highest possible workload reduction; the k 1 values usually being a secondary concern. Consider, for instance, that for n 1 = 8, the lower bound for achievable workload is 12.65%. If n 1 can be increased to 16, the aforementioned workload is achieved already at a relatively high fraction (k 1 = 2 −2 ) of templates being preselected at that level. It is, however, important to remember that the indexing and pre-selection may increase the falsenegative errors, i.e. the parameters n 1 and k 1 likely cannot be set to achieve the lower bound for computational workload without simultaneously causing a significant impairment of the biometric performance. In other words, the desired reduction in computational workload needs to be feasible w.r.t. the discriminative power of the utilised recognition system. This trade-off between computational workload and biometric performance is evaluated empirically later on in this article.

B. Selection of Feature Vector Pairs
Deciding which parent samples to fuse with each other is expected to have a non-trivial impact on the efficacy of the proposed system. With an intelligent matching of the fused subject pairs, an increase in the discriminative power of the pre-selection procedure is expected, thereby improving the overall results of the proposed system in terms of biometric performance and computational workload.
Ideally, similar data subjects/samples would be fused with each other. Conceptually, matching such pairs belongs to an old and well-known class of combinatorial optimisation problems. One could formulate it in terms of a stable roommates or stable marriage problem. In practical experiments, however, such formulation has been plagued by issues related to "odd pairs" and solvability on a large set of data (see [93]- [95]). In this work, those issues are circumvented by optimising the matching algorithm with a global cost function instead of seeking a stable matching. The benefit of this approach is that some poorly matched (i.e. with a high cost) pairs are allowed, while the overall matchings are well-optimised for a given enrolment database. In practical experiments, this formulation (corresponding to the assignment problem) has been applied successfully.
More formally, let S represent the set of data subjects present in the enrolment database. A bijective mapping of this set to itself is sought, i.e. f : S → S, with an additional constraint that the subjects may not be mapped to themselves, i.e. ∀s ∈ S, f (s) = s. Given a weight function C : S × S → R + , the aim of a successful mapping is to minimise s∈S C s,f (s) . This work considered three methods for mapping selection: Random samples are paired purely by chance, i.e. no special algorithm is used for the pair selection. Soft-biometric similarity based on soft-biometric attributes (sex, race, age) is computed between the enrolled samples as a basis for the assignment. Similarity-score similarity based on non-mated comparison scores between the enrolled samples computed with a facial recognition system serves as a basis for the assignment. In practice, given an N -subject large enrolment database, a square matrix with the aforementioned similarity scores (softbiometric or recognition based) can be created as illustrated in equation 3. There, S x denotes the x'th data subject, while c x,y denotes the cost of pairing the x'th and y'th data subject with each other. To represent the constraint of data subjects not being allowed to be paired with themselves, the diagonal is set to ∞. In the concrete software implementation, the largest possible value of a floating-point datatype is used instead.
As formulated above, a polynomial time solution for the problem exists using the so-called Hungarian algorithm [96]. An iterative procedure is used to produce pairs for subsequent steps in the cascade, i.e. for n > 2. While computationally intensive, this step is only required once (offline) during indexing of the enrolment database and not during every retrieval. Figure 3 shows examples of subjects paired using the softbiometric and similarity-score based methods described above. Details on the dataset and face recognition systems used in the experiments are provided in section IV.

C. Feature Fusion Methods
The choice of information fusion method has a non-trivial impact on the discriminative power and hence the biometric performance of a recognition system [43], [44]. The overarching goal of feature-level fusion is to create a fused feature vector , v i ∈ R from a pair of feature vectors v and v of same size. For simplicity, this definition and the formulas of the used fusion methods are provided in a notation for two feature vectors being fused (i.e. n = 2). However, for the specific application scenario considered in this article, they can be (and are) trivially extended to an arbitrary number (n) of feature vectors from the set {2 i | i ∈ N + }. Three fundamentally different types of feature fusion methods are considered and described below. In the provided formulas, µ represents an overall average value of elements at a given position and is computed on a disjoint training set of feature vectors.
1) Average-based: An intuitive method to fuse feature vectors is averaging. Following variants of this method are considered: Simple average The arithmetic mean of the elements at each feature position is taken: v i = (v i + v i )/2. Weighted average The arithmetic mean of the elements at each feature position is taken and additionally weighted by the distance of this element from an overall average at the given position computed on a training set: In other words, elements which strongly deviate from the average are assigned more weight.
The above two methods are henceforth referred to as "Average-1" and "Average-2".
2) Distance-based: The following methods rely on putting the values of the individual elements in relation to some overall properties of the feature vectors.
Distance from mean For each element position, the element furthest from an overall average at the given position is . In other words, the element which exhibits the strongest deviation from the average at a given position is used directly.
Distance from mean rank-based We define #(v i ) ∈ N to be the rank of v i in the sequence of elements of v sorted in ascending order according to their distance to the mean µ, preserving duplicate elements. Following this operation, the element with the highest rank at a given position is chosen: The above two methods are henceforth referred to as "Distance-1" and "Distance-2".
3) Index-based: The following methods depend on the position of the elements in the feature vectors.
Section segregation A portion (e.g. half) of each of the contributing vectors is taken directly:

Alternating index
The feature elements are directly taken from each of the contributing vectors in an alternating The above two methods are henceforth referred to as "Index-1" and "Index-2".

D. Template Protection
In the proposed scheme, template protection is facilitated through integration of homomorphic encryption. In general, an encryption algorithm E has the homomorphic property for an operation if it holds E(m 1 ) E(m 2 ) = E(m 1 m 2 ), ∀m 1 , m 2 ∈ M , where M is the set of all possible messages. For more details on this topic, see e.g. a detailed survey in [97]. As shown in [27], [28], the template comparator for biometric templates extracted from facial images can be feasibly implemented in the homomorphically encrypted domain. In other words, during template comparison in the protected domain, operations which are mathematically identical to those in the unprotected domain are conducted. Thus, the protection of the templates in the proposed scheme results in no loss of biometric performance (in contrast to typical biometric cryptosystems and cancelable biometrics). Using homomorphic encryption libraries described in subsection IV-C, the biometric probes, as well as the stored biometric reference templates and the index constructed from the fused templates can all be encrypted and compared in the protected domain, thereby fulfilling the biometric template protection objectives (unlinkability, irreversibility, renewability, and performance preservation) of ISO/IEC IS 24745 [22] (see subsection V-D for more details).

IV. EXPERIMENTAL SETUP
This section provides a detailed description of the setup for the conducted experiments. The used dataset and face recognition systems are described in subsections IV-A and IV-B, respectively; subsection IV-C describes the used homomorphic encryption software, while subsection IV-D gives an overview of the evaluation methodology and metrics.

A. Dataset
The academic MORPH dataset by Ricanek et al. [98] has been used in the experiments. A subset of images was selected based on approximate conformance with ICAO requirements for passport images [99]. As the proposed system is aimed to function with such semi-constrained images, the so-called "in-the-wild" datasets were not considered. Furthermore, the chosen dataset facilitates the soft-biometric pairing method described in subsection III-B, as groundtruth metadata is available for three demographic attributes -sex, race, and age. Figure 4 shows example images from the used dataset, while table III provides a numerical summary of its partitioning for the experiments.

B. Face Recognition Systems
Two well-known open-source face recognition systems are used in the experiments: ArcFace A somewhat recent (initial publication in 2018), but continually improved and refined system published by Deng et al. [100]. The code and pre-trained model "LResNet100E-IR,ArcFace@ms1m-refine-v2" provided by the authors are used 2 . CurricularFace A very recent system (2020) published by Huang et al. [101]. The code and pre-trained model "IR101" provided by the authors are used 3 . Both systems achieve excellent biometric performance in popular large-scale face recognition benchmarks. The systems extract compact feature vectors with 512 floating-point elements. Those vectors can be seamlessly fused using the methods described in subsection III-C. Euclidean distance is used to compute the dissimilarity between two feature vectors.

D. Evaluation Metrics
In the experiments, the proposed method is evaluated against an exhaustive-search based baseline. Two key aspects are considered using standardised methods and metrics [30] supported by additional ones which are commonly reported in the scientific literature: Biometric performance In closed-set identification experiments, the CMC curves, (true-positive) identification rate (IR), and rank-1 recognition rate (RR-1) are reported. In open-set identification experiments, the DET curves and false negative identification rate at a decision threshold corresponding to a fixed false positive identification rate of 0.1% (denoted FNIR 1000 ). Computational workload the overall computational workload (denoted W ) of a single biometric identification transaction is calculated the workload reduction by the proposed scheme w.r.t. baseline is computed. This is done based on the necessary number of template comparisons and reported for the proposed system in percentage terms in relation to the exhaustive search-based baseline (i.e. with W = 100%).

V. RESULTS
The proposed system is evaluated experimentally as follows: in subsections V-A and V-B, an analysis is conducted to establish suitable configurations which simultaneously minimise the computational workload and maximise the biometric performance. In subsection V-C, the overall results for the selected optimal configurations are reported. The scalability of the proposed method is briefly discussed in subsection V-E. Figure 5 shows the computational workload in terms of necessary template comparisons for the proposed indexing and retrieval method. In contrast to the general, theoretical overview from subsection III-A, this figure pertains to the specific experimental setup described in section IV.  As noted in the theoretical analysis, the desired target area for the parameter (n 1 and k 1 ) selection lies in the bottom right corner of the matrix. For instance, given n 1 = 16 and k 1 = 2 −3 , only 376 template comparisons are required for a biometric identification transaction, which is much lower than the 4096 template comparisons needed in the baseline scenario. In general, there exist several parameter configurations which result in the numbers of necessary template comparisons being significantly (between approximately 4 and 16 times) lower than those of a baseline retrieval method performing an exhaustive search.

A. Analysis of Computational Workload
In the next subsection, an analysis is conducted to determine whether the desirable configurations w.r.t. computational workload (i.e. the rightmost part of the matrix in figure 5) are also feasible w.r.t. biometric performance.

B. Analysis of Pairing and Fusion Methods
From the point of view concentrated on biometric performance, the optimal selection of n 1 and k 1 parameters depends on following two factors: 1) The inherent discriminative power of the recognition system. 2) The information loss caused by the template fusion. The information loss due to template fusion further depends on two factors: the used fusion method (section III-C) and the number of templates fused with each other (the n 1 parameter). As previously mentioned, the proposed indexing and retrieval scheme may cause false-negative errors when improperly configured, while the false-positive errors would remain unaffected or even slightly reduced through its application. To evaluate the two aforementioned factors, a closedset identification scenario can be used and evaluated using CMC curves, which report the identification rate at given ranks (denoted r) in an ordered list of comparison scores between the probes and enrolment database. Figure 6 shows the CMC curves for the considered facial recognition systems, template pairing methods, and template fusion methods. Aiming at highest possible workload reduction (recall subsection V-A), n 1 = 16 (i.e. maximum rank is 256) is selected. Table V shows the numeric values of identification rate for the specific ranks depicted on the x-axis in the figure.  all considered cases, the fusion methods based on averaging perform best, relatively closely followed by fusion methods based on distance from mean. The index-based fusion methods achieve a poor biometric performance. The differences between the fusion method variants within their respective method types are insignificant. 4) Although rank-1 identification rate is very low, both recognition systems quickly converge (for the best performing type of fusion methods) at 100% well before the maximum rank of 256. 5) In general, CurricularFace performs slightly better than ArcFace. However, the differences are not very large and the general trends described above persist across both recognition systems.
Based on the above evaluations and observations, the selection of optimal configurations for pre-selection can be made. Accordingly, following choices are made: Template pairing the method based on non-mated comparison scores is chosen. Template fusion the method based on averaging the contributing templates is chosen. Number of fused templates n 1 = 16 can be used, as both recognition systems appear to exhibit sufficient discriminative power to compensate for the information loss caused by fusing so many templates. Fraction of preselected templates To avoid too many preselection errors, configurations with IR(r) > 99.5% are considered. This condition is satisfied for both Ar-cFace and CurricularFace when r ∈ {32, 64, 128}, using the comparison score-based pairing and averagingbased fusion. These r values correspond to k 1 ∈ {2 −1 , 2 −2 , 2 −3 }. For recognition systems with greater discriminative power, it is conceivable to achieve even lower r and k 1 values, thereby facilitating higher workload reduction.

C. Overall Results
To evaluate the overall performance of the proposed indexing and retrieval system, open-set and closed-set identification experiments are carried out for the configurations selected in subsections V-A and V-B. Figure 7 shows the obtained DET curves, while table VI reports the numeric results using metrics described in subsection IV-D. Following observations regarding computational workload and biometric performance can be made: ArcFace All three chosen configurations perform similarly to the baseline. The most conservative one in terms of computational workload reduction, i.e. k 1 = 2 −1 , achieves biometric performance essentially indistinguishable from that of the baseline, while simultaneously requiring only around 18% of the computational workload that the baseline requires. The computational workload can be further reduced to less than 10% of the baseline workload (k 1 = 2 −3 ), while retaining a reasonable (albeit slightly reduced) biometric performance w.r.t. the baseline. CurricularFace The results mirror those of ArcFace, thus indicating a generalisability of the proposed indexing and retrieval method. The achieved computational workload reduction is identical, as same parameter configurations have been used. The biometric performance of Curricular-Face is slightly better than that of ArcFace, in particular at the FNIR 1000 operating point. The proposed system basically maintains the biometric performance of the baseline at k 1 ∈ {2 −1 , 2 −2 } for the practically relevant FPIR values, whereas k 1 = 2 −3 yields an even lower computational workload at the cost of a slight reduction in biometric performance. In table VII, a summary of the computational requirements for the proposed protected indexing system is given in actual runtimes and storage usage for the off-the-shelf hardware mentioned in subsection IV-C.
It can be observed that: • There exist massive differences in execution time and storage space usage between the benchmarked homomorphic encryption methods. The proposed indexing and retrieval method dramatically (order of magnitude) reduces the execution times of an identification transaction w.r.t. the baseline. • The one-time computational costs of encrypting the enrolment database and its index are negligible. • The execution times of the proposed system with BFV and especially CKKS based encryption do not suffice for real-time deployments, but could nevertheless be feasible whenever near-instantaneous system responses are not required. • Near-realtime runtimes and very low storage usage are achieved for the proposed system with NTRU-based encryption. This is mostly because the Hamming weight (i.e. the sum of the differences between individual feature vector elements) cannot be computed in the encrypted domain using this scheme. On the other hand, in BFV-and CKKS-based schemes, the analogous sum can be (and is) computed in the encrypted domain. While in principle still secure and privacy-preserving [28], [106], this means that using NTRU in the proposed system introduces a further trade-off between computational requirements and potential of some information leakage.

D. Security Analysis
The proposed system fulfils the biometric template protection objectives specified in ISO/IEC IS 24745 [22]: Unlinkability a random factor is utilised in the encryption functions. Thus, encrypting an identical plaintext twice results in two different, unlinkable ciphertexts. Irreversibility the used HE schemes are based on ideal lattices, i.e. are post-quantum-secure [109]. They provide encryption with the strength of 128, 192, or 256 bits 4 . There exists a trade-off between security and computational requirements -as the encryption strength increases, so does the computational complexity. Renewability the HE key pair can be exchanged, whereupon the biometric templates in the enrolment database and index can be re-encrypted. Performance preservation the comparator used in the homomorphically encrypted domain is functionally identical to that of the plaintext domain, i.e. it yields the same comparison scores. The biometric performance of the proposed indexing system is nearly identical to that of the baseline.
While traditional encryption schemes may provide stronger security guarantees than homomorphic encryption, this does not constitute the actual limiting factor w.r.t. facial biometrics. The entropy of facial embeddings is considered to be much lower than the aforementioned achievable cryptographic protection levels. For example, e.g. in [110], it has been shown that while typical facial embeddings extracted by deep neural networks consist of 512 values, their intrinsic dimensionality is much lower (more than an order of magnitude). In other words, it is more feasible (albeit still extremely difficult) to guess a sufficiently similar facial biometric template than to guess the encryption keys.
Finally, note that such attacks aimed at guessing the biometric templates and/or encryption keys (or other secrets) are not limited to the applications of homomorphic encryption for the purpose of biometric template protection. Other types of dedicated biometric template protection approaches (recall subsection II-C) as well as classic general-purpose (non-homomorphic) encryption must likewise address those challenges.

E. Scalability
As the size of the enrolment database increases, following factors within the proposed system need to be considered: Pairing although the pairing algorithm is computationally intensive, its computational costs could be easily mitigated by distributing the computations or additionally binning the enrolment database. It should also be noted that an increased size of the enrolment database would result in a larger probability of finding suitable pairingsespecially for the outlier subjects (and hence an increased discriminative power of the system). Fusion the operations for fusing the templates are implemented efficiently using vectorised operations; these computational costs are generally are negligible, e.g. in comparison with those required for template pairing. Furthermore, this part of the proposed system's pipeline can be trivially parallelised and/or distributed. Encryption the computational costs of encrypting the enrolment database and index are generally very low and can additionally be trivially parallelised. The amount of RAM required to pre-load (for use during retrieval) the entire enrolment database and its index is approximately twice that of the baseline. Retrieval the computational workload of the proposed system scales sub-linearly w.r.t. to the number of enrolled subjects (as opposed to a typical baseline, which typically scales linearly). Due to a flexible design of the proposed system, a dynamic adjustment (w.r.t. enrolment database size) of the decision thresholds and pre-selection subset sizes is possible. Lastly, the underlying concepts in the proposed indexing and retrieval system can be trivially distributed or parallelised.
The pairing, fusion, and encryption operations are computed infrequently and offline; they thus do not directly influence the (online) retrieval time. Considering the execution timings in table VII, it is important to note that the experiments were carried out in a single-threaded environment on an ordinary laptop. Taking advantage of parallelisation or distribution of the computations, as well as utilising more powerful hardware, these execution times could be vastly lowered (c.f. [29]).

VI. CONCLUSION
In this article, a method of computationally efficient indexing and retrieval of biometric data has been presented. The proposed indexing method relies on intelligent pairing of facial parent templates based on their similarity (in terms of soft biometrics or non-mated comparison scores), followed by feature-level fusion. The created search structure facilitates a multi-step biometric identification retrieval, whereby the retrieved candidate lists are successively shortened in each step of the cascade.
In a comprehensive experimental evaluation, several different pairing and fusion methods were benchmarked for the indexing step using two modern, open-source face recognition systems. Using standardised evaluation protocols and metrics, the proposed method was shown to achieve a biometric performance nearly identical to that of an exhaustive searchbased baseline; simultaneously the computational workload of biometric identification transactions has been substantially reduced (down to ∼10%). In other words, by using the proposed system during biometric identification, a tenfold reduction in the required computational effort is possible with no negative impact on the biometric performance. By integrating homomorphic encryption, the proposed system achieves post-quantum-security and the biometric template protection objectives of unlinkability, irreversibility, and renewability.
In summary, the proposed system achieves a very good balance between biometric performance, computational efficiency, and privacy protection for biometric identification scenarios.