Iris-Fingerprint multimodal biometric system based on optimal feature level fusion model

For reliable and accurate multimodal biometric based person verification, demands an effective discriminant feature representation and fusion of the extracted relevant information across multiple biometric modalities. In this paper, we propose feature level fusion by adopting the concept of canonical correlation analysis (CCA) to fuse Iris and Fingerprint feature sets of the same person. The uniqueness of this approach is that it extracts maximized correlated features from feature sets of both modalities as effective discriminant information within the features sets. CCA is, therefore, suitable to analyze the underlying relationship between two feature spaces and generates more powerful feature vectors by removing redundant information. We demonstrate that an efficient multimodal recognition can be achieved with a significant reduction in feature dimensions with less computational complexity and recognition time less than one second by exploiting CCA based joint feature fusion and optimization. To evaluate the performance of the proposed system, Left and Right Iris, and thumb Fingerprints from both hands of the SDUMLA-HMT multimodal dataset are considered in this experiment. We show that our proposed approach significantly outperforms in terms of equal error rate (EER) than unimodal system recognition performance. We also demonstrate that CCA based feature fusion excels than the match score level fusion. Further, an exploration of the correlation between Right Iris and Left Fingerprint images (EER of 0.1050%), and Left Iris and Right Fingerprint images (EER of 1.4286%) are also presented to consider the effect of feature dominance and laterality of the selected modalities for the robust multimodal biometric system.


Introduction
Biometrics recognition of individuals has gained attention recently including international border crossing to unlock mobile devices. Technological advances, improved accuracy coupled with increased demands of the development of real applications have led to emerging a multimodal biometric system. The integration of multiple biometric modalities in the multimodal system has proven more robustness to non-universality, noisy data, the possibility of spoof attacks [1,2], and shown to be very effective in improving recognition performance [1,3]. However, the design and implementation of the fusion algorithm is a challenging task as its benefits depend on the selection of biometrics modality, computational and storage resources, accuracy, choice of fusion strategy, and cost [1,3,4]. The fusion of multiple biometric evidences may be carried out at four different levels -sensor, feature, match score, or decision [1][2][3][4][5][6][7]. Among these fusion levels, feature level fusion results in better recognition performance as more discriminative features from different modalities can be well preserved [1,8]. In the multimodal system, usually, feature fusion is performed by integrating different features extracted from different modalities into a joint and compact feature representation. For homogeneous extracted feature sets (having the same measurement scale and dimension), it is easier to apply feature fusion, and a fused feature vector can be obtained using the weighted average technique. While in heterogeneous feature sets(e.g., face and fingerprint) -feature sets extracted for different modalities using different feature extraction algorithms, a single feature set can be formed by concatenating them [9]. But for incompatible feature sets (e.g., IrisCode feature of Iris and minutiae feature of fingerprint), it becomes difficult to perform concatenation directly due to inherent differences in the feature representation [10]. Several authors [7,[9][10][11][12][13][14][15][16][17] have explored different feature fusion approaches in order to fuse different modalities reasonably and effectively for multimodal systems.
From the literature, the conventional fusion methods such as the weighted feature fusion or concatenation or weighted concatenation, ignore the intrinsic relationship of the feature sets, are inefficient as the dimensionality of feature space increases and also requires complex matcher to classify fused feature vector. To address these limitations, for feature fusion, the learning method based on maximizing mutual information is proposed in this paper. Thus, it can retain the effective discriminant information within the features sets and removes the redundant information [18], which is especially required for effective recognition. For this, we explore canonical correlation analysis (CCA) for feature fusion which deals with the mutual information between two sets of multidimensional data [19,20] by identifying linear relationships. The objective function of CCA is to maximize the pairwise correlations between two feature sets. CCA looks for the optimal transformation which makes the corresponding variables in the two feature sets maximally correlated. This approach can learn and map the function into a space for correlation measurement. For optimal transformation, it aims to maximize the similarity between the discriminatory feature sets of different modalities by removing the redundant ones. In addition, CCA is independent of affine transformation which eliminates the need for a complex matcher design. In the literature, CCA and its several variants have been proposed for finding maximally correlated projections and demonstrated outstanding results than the prevalent feature fusion methods in a wide variety of domains. In [18] CCA is proposed for feature fusion where canonical correlation features from face and handwritten Arabic numerals are extracted as effective discriminant information for recognition. In another work [21], extracted feature vectors from the palmprint and finger geometry are fused using CCA to get a reduced feature space dimension thus improving the average recognition rate. In the study [22], kernel CCA (KCCA) based feature fusion is explored to discover the nonlinear subspace learning representation between the ear and profile face modalities. In order to improve the performance of CCA in classification, a supervised localpreserving canonical correlation analysis method (SLPCCAM) is proposed for fingerprint and fingervein [23]. For a good representation of the similarity between the samples in the same class and to evaluate the dissimilarity between the samples in different classes, a feature level fusion approach based on the Discriminant Correlation Analysis (DCA) is presented in [24] for Iris, Fingerprint, and Face multimodal recognition. To deal with multiple sets of features, a feature fusion approach based on multiset generalized canonical discriminant projection (MGCDP) that incorporates the class associations are studied in [25]. Experiments show that MGCDP achieves promising recognition accuracy on palm vein, face, and fingerprint.
Further, a feature fusion approach based on CCA is presented in [26] for Iris and Fingerprint images and achieved significantly improved performance. In another work [27], CCA based on L1-norm minimization (CCA-L1) and its extension are proposed to deal with multi-feature data for feature learning and image recognition. In the study [28], two-dimensional supervised canonical correlation analysis (2D-SCCA) and multiple-rank supervised canonical correlation analysis (MSCCA) algorithms are proposed to perform multiple feature extraction for classification. Experiments show that MSCCA achieves promising recognition accuracy on object, face, and fingerprint recognition. In another study, [29], multiple rank canonical correlation analysis (MRCCA) and its multiset version referred as multiple rank multiset canonical correlation analysis (MRMCCA) are explored for effective feature extraction from matrix data. The authors demonstrated the superiority of the proposed methods in terms of classification accuracy and computing time on the face, fingerprint, and Palm data sets. Recently, 2D models for multi-view feature extraction and fusion of matrix data such as twodimensional locality preserving canonical correlation analysis (2D-LPCCA) and two-dimensional sparsity preserving canonical correlation analysis (2D-SPCCA) are proposed in [30]. In this work, to reveal the inherent data structure with relations, 2D-LPCCA utilizes the neighborhood information while 2D-SPCCA utilizes the sparse reconstruction information.
Motivated by the success of CCA and its extensions in the feature fusion, in this paper, we propose CCA to represent discriminative features by exploring significant relationships between the Iris and Fingerprint feature sets of the same person. In this aspect, we propose a simple, extremely fast, and promising approach to make a unified framework that can conveniently investigate the feature fusion information mainly contributed by the CCA. In summary, the key contributions of this work are: • We propose a novel approach for accurately modeling the feature fusion of Iris and Fingerprint modalities by maximizing the pair-wise correlations between them. • We show the effectiveness of the proposed model by experimenting with it on a publicly available SDUMLA-HMT multimodal dataset. The affine invariance property of CCA eliminates the need for a complex matcher and helps to design a rotation-invariant recognition system. • We explore the effect of feature dominance and laterality of the selected modalities on the performance of a developed system by performing cross-match biometrics feature fusion. For that, performance evaluation is carried out considering i) Left Iris and Right Fingerprint ii) Right Iris and Left Fingerprint images of the same person and obtained interesting initial results for the developed robust multimodal biometric system. • We evaluate our proposed approach showing significantly improved recognition performance of the multimodal biometric system over other existing methods.
Paper organization: Proposed CCA based feature fusion multimodal system with different distance measures is outlined in Section 2. An experimental setup is described in Section 3. Experimental results and analysis are presented in Section 4. Cross match experimentation and analysis based on the proposed fusion approach are discussed in Section 5 and then conclusions in Section 6.

Proposed system framework
In this paper, we present a framework for feature level fusion using canonical correlation analysis. Although our proposed framework applies to any biometric modality, we restricted it to the Fingerprint and Iris modality of the same subjects. Both Fingerprint [2,3] and Iris [31,32] recognition, have higher accuracy, reliability, simplicity, and are well-accepted, making them very promising technologies for wide deployments compared to other biometric modalities. An overview of our framework is demonstrated in Figure 1 which mainly consists of a training phase and recognition phase. Here, we try to learn canonical correlation features from Iris and Fingerprint feature sets in the canonical space by adopting correlation criterion function and devise effective discriminant representations. During the training phase, transformation matrix or basis vectors are found to project Iris and Fingerprint feature sets in the canonical space. Then, by applying the summation method in the canonical space, the fused feature vectors are created for 'n' number of training samples and stored in the database. During the recognition phase, first, extract canonical correlation features for the test sample, projects them in the canonical space using the same transformation matrix, and then by applying the summation method test fused feature is created. This test fused feature vector is compared with the stored fused vector to find match or non-match. In the following sections, we explain the fundamentals of CCA and show how it is suitable for information fusion at the feature level.

Canonical correlation analysis concept
CCA is a subspace learning method that learns a common representation by maximizing the correlation between two sets of feature vectors when projected on the common space [19,20]. Given, two feature sets X = [X 1 , X 2 ...X n ] and Y = [Y 1 , Y 2 ...Y n ] with zero-mean such that X ∈ R pxn and Y ∈ R qxn , from the same 'n' number of subjects. As proposed by Hotelling [19], CCA is used to compute linear transformations, W x and W y , one for each feature sets, which make the corresponding variables in the two feature sets, maximally correlated in the projected space. The information on associations between two feature sets X and Y can be obtained by considering the within sets covariance and between sets covariance matrices. Following correlation function [33] is to be maximized to find W x and W y , Here, within sets covariance matrices are denoted by C xx ∈ R pxp and C yy ∈ R qxq , and between sets covariance matrices are indicated by C xy ∈ R pxq and C yx = C T xy . The maximization of Eq (2.1) is equal to maximizing the numerator [20], subject to W T x C xx W x = W T y C yy W y = 1 for i = j, the subsequent canonical correlations are uncorrelated for different solutions where i j. According to [20] the canonical correlations between X and Y found by solving the eigenvalue equations, where, ρ 2 eigenvalues, are the squared canonical correlations or the diagonal matrix of eigenvalues and W x and W y eigenvectors, are normalised canonical correlation basis vectors. From Eq (2.2), both matrices C −1 xx C xy C −1 yy C yx and C −1 yy C yx C −1 xx C xy have the same eigenvalues but different eigenvectors and its solutions are related by [33], 3) consists of nonzero eigenvalues equal to d = rank (C xy ) ≤ min(p, q), such that λ 1 ≥ λ 2 ≥ · · · ... ≥ λ d . While the sorted eigenvectors are given by the transformation matrices W x and W y , and x = W T x X ∈ R dxn and y = W T y Y ∈ R dxn are refereed as canonical variates or projected correlation features. These canonical variates are uncorrelated within each feature set as it shows nonzero correlation only on their corresponding indices of the canonical variates.

Feature fusion
The graphical interpretation for the application of CCA in our experiment on Iris and Fingerprint images of the same subjects is shown in Figure 2. Here we are interested in learning common representations contained in the Iris feature space and Fingerprints feature space which is reflected in correlations between them. By finding transformation matrix W x and W y , such that it maximizes the pair-wise correlations between two sets. We applied CCA to project the extracted Iris and Fingerprint feature set of 'n' number of training samples in the canonical space. Within the canonical space, a fused feature may be obtained by performing either concatenation or summation [18]. Given, ..Y n ] ∈ R qxn , the corresponding feature sets -Iris and Fingerprint respectively, and, p and q be their feature dimensions. After obtaining eigenvectors where X is the mean of feature vectors X and Y is the mean of feature vectors Y. Then fused feature vector Z using concatenation Eq (2.4) and summation Eq (2.5) of the transformed feature vectors [18] can be computed as: We used summation method Eq (2.5) in our proposed approach to reduce computational complexity as the vector length of the concatenation method is twice that of the summation method. Then, during the training phase, the fused feature vectors Z are stored as a template in the gallery. While in the testing phase, the fused feature vector of the query sample can be classified using any classifier.

Distance measures as a matcher
In this paper, the fused feature of test image Z t is matched with gallery fused feature vectors Z, by using three different distance measures, namely, Manhattan, Euclidean, and Cosine Similarity for the feature level fusion. By definition [34], Euclidean and Manhattan distance, exhibit the distance between two vectors, considering the magnitude. The cosine similarity measure only considers the angle similarity and discards the scaling on the magnitude and also overcomes the limitations of the Euclidean which is sensitive to outliers. At the matching stage, to classify fused feature vector in canonical subspace instead of a single vector, the distance measures are no longer effective, but the angles between subspaces become a practical measurement. Hence, simple matchers are selected to make the matching process extremely fast and to study the performance of the multimodal system, mainly contributed by the CCA based feature fusion algorithm. In Manhattan distance, the match image is found by performing the matching between the test vector Z t with training vectors using where N is used as a scaling factor and equal to the length of the fused feature vector. As this distance measure does not take the shortest path possible, it yields a higher distance estimate. To find a match using Euclidean distance, we find the matching training image that satisfies Eq (2.7) arg min In Cosine similarity measure, the cosine of the angle between the test vector and training vectors is computed. The match for the test vector Z t is found by Eq (2.8). Here, the angle value closer to zero indicate a better match.
arg min The distance between Z t and Z i approaches '0', when the estimates are close to each other.

Multimodal biometric database
SDUMLA-HMT Multimodal Database from Shandong University [35] consists of total five biometric modalities such as Face, Finger vein, Gait, Iris, and Fingerprint of the same subject (person). This database has a total of 106 subjects, including 61 males and 45 females with ages between 17 and 31 [35]. Here, we chose Iris and Fingerprint modalities for evaluation with details shown in Table 1.

Image quality assessment
The recognition performance of the biometric system heavily depends on the quality of biometric sample in consideration [36]. The information about the quality of the biometric sample prior to matching may be used to extract the reliable features and boost the performance of a biometric recognition system [37].

Iris
In this work, Iris image quality is assessed using VASIR (Video-based Automatic System for Iris Recognition) [38], developed by NIST Iris recognition software platform. VASIR uses the automatic image quality measurement (AIQM) method to generate scalar quality scores [37,38]. For each Iris image, the quality score is calculated as shown in Figure 3. For SDUMLA database, the threshold comes to be 14.73615 using T hreshold = Average − ((Max − Min)/4) from the entire database quality score. Iris images having a quality score greater than or equal to the threshold are selected to perform experiments.

Fingerprint
In this work, NIST Fingerprint Image Quality algorithm (NFIQ) [39] is used to assess the quality of Fingerprints. NFIQ analyses a Fingerprint image and assigns five different quality levels with '1' being the highest quality and '5' being the lowest quality [40]. For each Fingerprint image, the quality level is calculated as shown in Figure 4. Images having a quality level of 1, 2, and 3 are selected to perform experiments. Fingerprint images with NFIQ score quality 4 and 5 are considered as bad Fingerprint images and not recommended to be enrolled for biometric purposes.

Preprocessing
From the quality assessment results of the Iris images, it is found that Iris images from the SDUMLA-HMT database have very low contrast between sclera and iris, and failed to segment iris correctly. Hence, Iris image enhancement step is necessary before performing segmentation. Resized 768x576 gray level eye images to 384x288 and contrast enhancement are performed using 'imadjust' and log transformation. This results in a smoother transformation that mostly enhances useful details and thus improves segmentation. Then, we have used the automatic Iris segmentation approach presented in [41] for extraction of Iris region from the eye image. For perfect segmentation, the radius values are in the range of 78 to 148 pixels for the Iris and 14 to 58 pixels for the pupil.
After Quality Assessment using NFIQ of Fingerprint images, it is resized to 256x256 pixels. Then the center point (the uppermost point of the innermost curving ridge) is detected from the resized images. The translation invariance of Fingerprint images can be achieved by using this center point as a reference point. Here, a circular region of interest around the reference point is determined which is tessellated into concentric bands, and each band is further divided into sectors.

Feature extraction
To achieve good performance on both unimodal Iris [42] and unimodal Fingerprints [43] system, a fixed length IrisCode and FingerCode are generated by extracting the features from the preprocessed images using the Gabor filter. For Iris images, following the segmentation step, normalization is done to make Iris representation invariant to the size of iris and pupil dilation. The extracted iris is mapped into fixed dimensions of 20(r) x 240(θ) of polar image coordinates. These values indicate the radial and angular resolution of the normalized image respectively, which is a trade-off between noise removal and obtaining reasonable size templates. For Iris images, normalized images are convolved with a log-Gabor filter for feature extraction. Then, encoding is performed by mapping the phase responses of the filter to one of the four quadrants in the complex plane and are quantized to '0's and '1's. This encoded binary representation of the Iris image is referred as the IrisCode. As per [41], the total number of bits in the IrisCode is the angular resolution times the radial resolution, times 2, times the number of filters. This produces a fixed-length (240*20*2*1) 9600x1 dimensional feature vector in binary form.
We have used Gabor filters to capture the texture information of the preprocessed Fingerprint images at a different orientation. Features for Fingerprint images are obtained by convolving the preprocessed images with Gabor filters at eight different orientations as proposed in [43]. The advantages of using Gabor filters in the Fingerprint are i) removes noise, ii) preserves the ridge and valley structures, iii) provides the information contained in an orientation, iv) Minutia viewed as an anomaly in parallel ridges. All this texture information is captured by determining the average absolute deviation from the mean of gray values in individual sectors in filtered images, to represent Fingerprint feature vector 'FingerCode'. In our experiment, we have used a total of 5 concentric bands having width of 18 pixels each and each band of 16 sectors. Hence, a FingerCode of size 640x1 is formed using the selected parameter [No. of concentric band * No. of sectors per band * No. of Gabor filter]. The generated features vector is real-valued vectors. In this work, a static one-bit discretization scheme that uses simple threshold-based binarization for the quantization of a feature element [44] is implemented. For this, feature mean of the entire training set is computed and set as a threshold. Then by applying quantization, a binary representation of the real valued feature vector of 640x1 dimensions is obtained. The primary purpose of using a discretization step is to employ Hamming distance matcher even for FingerCode features.

Unimodal biometric system
In this work, the performance of unimodal Iris recognition system as well as unimodal Fingerprint recognition system is evaluated using Hamming distance (HD) matcher. The advantage of using the single matcher for both modalities is that it improves the processing speed, reduces the complexity of the system, and also simplifies the design process. HD offers fast matching speed because the calculation of the HD is taken only with bits that are generated from the actual Iris region or Fingerprint region. Both feature representations, Iriscode and FingerCodes are not rotationally invariant. In order to make a rotation invariant recognition system, a circular shift of −15 0 to +15 0 is used while calculating the HD for IrisCodes as well as for FingerCodes. The minimum HD from these shifts indicates a better match [41]. Further, unimodal system performance is also tested with Manhattan, Euclidian, and Cosine Similarity measures.

Feature fusion
We first performed dimensionality reduction on extracted feature vectors of multimodal Iris and Fingerprint samples using principal component analysis (PCA). It helps to minimize the computational cost in the training phase as well as avoid small sample problem [18]. In PCA, the upper bound of the feature vector length corresponds to nonzero eigenvalues which is equal to 'total images -1' for each modality. In this work, we reduce the Iris feature vector of 9600x1 dimensions and the Fingerprint feature vector of 640x1 dimensions to two decreased dimension feature vectors of the same dimensions (e.g., Feature dimensions of 235x1 for right images). In the training phase, the reduced dimension feature vectors of Iris and Fingerprint are further processed by CCA as shown in Figure 5. The two projection matrix W x and W y , and single fused feature vector Z are obtained as defined in Eq (2.5) and then stored in the database, W x , W y and Z as the template. In the testing phase, test sample features are first projected in the canonical space using the same projection matrix W x and W y . Then by applying the summation method Eq (2.5) test fused feature vector Z t is created. This test fused feature vector Z t is compared with the fused vector templates Z for matching based on different distance or similarity measures as described by Eqs (2.6), (2.7) and (2.8).

Performance evaluation: Right Iris and Right Fingerprint
The recognition performance of the proposed feature fusion method is evaluated on the Right Iris and Right thumb Fingerprint images of the multimodal database in order to do rigorous testing of the designed framework and algorithm. Here, based on the quality result and the added constraints of correct segmentation of Iris and correct detection of the central point of Fingerprint, out of 106 subjects, only 59 common subjects are selected having both modalities. In this work, for both modalities, we use the first 4 images per subject in the training set (total 59 Classes and 4 impressions per Class) and the remaining for testing. Thus, for both modalities, a total of 2*59*4 = 472 images are used for training, with a total of 354 intra-class comparisons (genuine trials) and 27376 inter-class comparisons (imposter trials).
The experimental results for the Right Iris unimodal system and Right Fingerprint unimodal system is presented in Table 2. The performance, in terms of EER of 1.9762% and 2.7287%, is obtained for individual Iris recognition system and Fingerprint recognition system using Hamming distance matcher, respectively. Furthermore, for a fair comparison, we have applied PCA to extracted features from individual modalities (IrisCode and FingerCode) and performed recognition using PCs. Table 2 shows the performance of individual Iris recognition systems and Fingerprint recognition systems, in terms of EER, for similarity metrics such as Manhattan, Euclidian, and Cosine Similarity, and corresponding ROC curves are shown in the Figure 7   The experimental findings for feature level fusion on the Right Iris and Right thumb Fingerprint using a PCA, CCA and PCA+CCA approach is shown in Table 3. Experimental results demonstrate that the PCA+CCA approach benefits from its encouraging properties and achieves competitive recognition performance with low computational complexity. Three distinct matchers are used to assess the performance of the proposed feature level fusion. The performance, in terms of EER of 0.5698% for Manhattan Distance, 0.2813% for Euclidian Distance, and 0.2812% for Cosine Similarity. Thus, the proposed PCA+CCA feature level fusion approach outperforms both PCA feature fusion and CCA feature fusion for Iris and fingerprint modalities, as shown by achieved performance in terms of EERs. Therefore, except Table 3, in the entire paper, the proposed PCA+CCA approach is referred as CCA based feature fusion. For a clear comparison, Figure 6 shows match score distribution for unimodal and multimodal system. It can be seen from Figure 7(b) ROC curves that PCA+CCA approach with cosine similarity measure consistently outperforms than other matchers. This clearly indicates that PCA+CCA approach (referred as CCA based feature fusion) not only brings the effect of dimension reduction while fusing correlated features of two modality but also achieves higher recognition accuracy.    We also note that, in practice, both Euclidean and Manhattan metrics which depends on the magnitude of the vectors, are incapable of capturing the intrinsic similarities between images while cosine similarity offers the advantage of stability to noise and is insensitive to the global scaling of the vector magnitude. The cosine similarity metric enhances the robustness of the fused feature by implying a good generalization ability which is one possible reason for the superior performance.
In this work, we also compare the performance of the proposed feature level fusion with score level fusion. Here again, the fusion of matching scores obtained from Hamming distance matcher for Right Iris and Right Fingerprint images is implemented using classic rules such as Sum rule and Weighted Sum rule [45]. The sum rule is an extensively used and efficient fusion scheme [45,46], capable of combining the scores provided by multiple matchers effectively using a weighted sum. In this work, the fusion score S f use is computed for the simple weighted fusion using Eq (4.1) for N matcher or classifier is given as follows: For two modalities, N = 2, Eq (4.1), score becomes S 1 and S 2 , W 1 and W 2 be their weights. Here, S 1 and S 2 are Iris and Fingerprint matched scores respectively; weights W 1 and W 2 are varied over the range [0, 1], such that the constraint W 1 + W 2 = 1 to be satisfied [45]. However, the scores of different biometric can be weighted differently, for example, the error rate of Iris is lower than Fingerprint, so the Iris score may be assigned greater weight than that of the Fingerprint. Finally, this fused matching score is used to recognize an individual as a genuine or an imposter. The experimental result is presented in Table 4. We empirically selected the weights for match score level fusion using the weighted sum method by attempting to get the maximum recognition accuracy rate with each matcher. The least equal error rate is used to define the set of weights to be used. After experimenting with different weight values, the weights for each individual matcher are fixed to the same value: 0.5 for W1 and 0.5 for W2. Normally, each matcher's weight is determined by its recognition performance on a training set.  In this work, for both modalities, the Hamming distance matcher is proposed so that output scores from both of the systems are in the same format and helps to eliminate the use of additional normalization techniques and complex fusion matcher techniques. Figure 8(b) shows the comparative EER performances of score level fusion with feature level fusion. The ROC curve shows that CCA based feature level fusion significantly outperforms than the match score level fusion approach.

Cross match experimentation and analysis
In this paper, we have performed an experiment to evaluate the effect of cross matching biometrics feature fusion using Iris and Fingerprint biometric modalities, that are strictly captured from the same person (subjects). In order to study the performance effect due to cross matching in the true sense, we have selected Iris and Fingerprint images of the same person who is present in both earlier left and right experimentation. The images selection protocol remains same as stated earlier -selection based on the quality result and the added constraints of correct segmentation of Iris and correct detection of the central point of Fingerprint. There are a total of 45 subjects and a total of 59 subjects that satisfied images selection protocol in Left Iris and Left Fingerprint experimentation, and, Right Iris and Right Fingerprint experimentation respectively. Among 45 and 59 subjects, only 35 common subjects are selected having both modalities in both the experiments. In this work, for both modalities, we use the first 4 images per subject in the training set (total 35 Classes and 4 impressions per Class) and the remaining for testing. For training, total images of 2 * 35 * 4 = 280 for both modalities are used. There are a total of 210 intra-class comparisons, and 9520 inter-class comparisons. We have performed the following two cross matching experiments and the evaluation performance is summarised in Table 5.

Performance evaluation for Left Iris and Right Fingerprint
In this experiment, the Left Iris and the Right Fingerprint of 35 subjects are used to perform cross matching feature fusion. For unimodal Left Iris recognition and unimodal Right Fingerprint with Hamming distance matcher, the performance in terms of EERs is of 0.9559% and 3.1513% respectively. But for CCA based feature fusion, using Left Iris and Right Fingerprint, we observed EER of 1.4286% for Cosine Similarity, 0.1471% Euclidean and 0.3466% Manhattan distance. Figure  9 shows ROC curves with different matchers. For feature fusion approach with cosine similarity measure, a significant drop in EER as compared to other matchers. In this cross matching experiment, multimodal features have different discriminating power which could further limit the discriminability of a fused result. This implies that performance-wise if strong Left Iris modality is fused with weak Right Fingerprint modality at feature level then it does not guarantee that obtained result is encouraging as obtained in the earlier experiments. This clearly indicates that even if Iris and Fingerprint modalities are of the same person, there is a certain close relationship, maybe genetics based relationship that directly affects and dominates the performance [47]. This intimates that one should take into account the feature dependency while designing the multimodal system as it affects the system performance.

Performance evaluation for Right Iris and Left Fingerprint
In this experiment, the Right Iris and Left Fingerprint of 35   respectively. But for CCA based feature fusion, using Right Iris and Left Fingerprint, we observed EER of 0.1050% for Cosine Similarity, 0.1786% Euclidean and 0.4307% Manhattan distance. Figure  9(b) shows ROC curves with different matchers. For the feature fusion approach with cosine similarity measure, EER is significantly better as compared to other matchers. This experiment shows that performance wise if strong Right Iris modality is fused with moderate Left Fingerprint modality at the feature level then there is a possibility to obtain the consistent result as obtained in the earlier experiments. It suggests that the concepts of laterality should be considered while implementing the matching algorithm to improve the verification performance of the multimodal system [47]. Again here, this clearly indicates that feature dependency should be taken into account while designing the multimodal system as it directly affects the performance of the multimodal system.

Comparison with existing methods
Comparing with earlier work based on feature fusion and matcher score fusion, our algorithm shows an encouraging performance among typical algorithms. As there are limited previous studies found that utilized the SDUMLA-HMT database, we compare our approach with real multimodal different datasets for the same biometric modalities. For example, an efficient fusion scheme at the feature and match score level to combine face and palmprint modalities is [46] presented. They have performed feature selection and fusion using binary particle swarm optimization (PSO) technique and achieved the best GAR (Genuine Acceptance Rate) of 97.25% at FAR (False Acceptance Rate) of 0.01% for hybrid fusion. The author claims that the use of PSO benefits to reduce the number of feature dimensions and complexity of the multimodal system. A multimodal sparse representation at feature level fusion algorithm for Fingerprint and Iris modalities is explored in [10]. This approach utilizes a sparse linear combination of training data to represent the test data. A quality measure for fusion based on the joint sparse representation and kernel technique has been presented to achieve recognition robustness. The experimental evaluation demonstrates the rank-1 recognition rate of 98.7%, indicating a significant improvement in the performance of a multimodal system. Another work [24], considers a feature level fusion strategy for multimodal recognition based on Discriminant Correlation Analysis (DCA). This fusion method takes into account the feature sets' class relationships, removing correlations between classes while concurrently restricting correlations within classes. Using DCA-based feature fusion algorithms and a minimum distance classifier, a rank-1 recognition rate of 99.60% is attained for the multimodal system. The Group Sparse Representation based Classifier (GSRC) approach is studied by [14], which integrates multi feature representation seamlessly into classification. This approach utilizes the feature vectors extracted from different modalities to perform accurate identification with feature level fusion and classification. The author reported the efficacy of the proposed approach at the rank-1 recognition rate. This approach has the benefit of efficiently handling multimodal biometrics and multiple types of features in a single framework. We found a previous work [17] that used SDUMLA-HMT database to investigate the multimodal system using the Iris, Face, and Finger Vein modalities. So, this work is considered for comparison. A feature level fusion strategy is used in this paper, which uses convolutional neural networks (CNNs) to extract features and classify images using the softmax classifier. A pertained model VGG-16 was used to develop a CNN model and got a 99.39% accuracy.
The experimental findings of our proposed approach show that feature level fusion based on CCA is useful in identifying the most correlated features between two feature sets of Iris and Fingerprint. Furthermore, our method is equally powerful in representing the fused feature vector referred as canonical correlation discriminant vector and reducing the probability of false match rate. Thus, the proposed multimodal biometrics system can surely improve the universality, accuracy, and security of a verification system with due consideration of cross match modalities. Using the SDUMLA-HMT database to examine the performance of a multimodal system in cross match modalities is unique to our research because no other study had done so before us. Our prototype model ran on PC with 3.10 GHz processor and 8GB RAM. For Right Iris and Right Fingerprint, training time is 0.145945 seconds while testing time is of 0.012539 seconds per person. The comparative result analysis of our proposed approach with existing approaches is shown in Table 6.

Conclusions
In this paper, an optimal feature level fusion model based on CCA is presented to extract and represent discriminative features by exploring significant relationships between the Iris and Fingerprint feature sets of the same person. The performance is evaluated for different distance and cosine similarity measures on the SDUMLA-HMT multimodal database in a verification scenario. From experimental results of CCA based feature level fusion with Cosine Similarity matcher, we found significantly improved recognition performance compared to unimodal systems, in terms of equal error rate (EER)using a) Right Iris and Right Fingerprint images (EER of 0.2812%) and b) Right Iris and Left Fingerprint images (EER of 0.1050%), while significantly poorer recognition performance using c) Left Iris and Right Fingerprint images (EER of 1.4286%). It suggests that the concepts of laterality should be considered while implementing the matching algorithm to improve the verification performance of the multimodal system. Further, one should take into account the feature dependency while designing the multimodal system as it affects the system performance. Cross matching is a novel area of profound investigation in multimodal systems. We have obtained interesting initial results, but further exploration should be done with a larger database. This paper offers new perspectives for designing the feature level fusion model for multimodal systems for Iris and Fingerprint modalities which are efficiently represented in canonical space. But, in order to take advantage of feature level fusion and find the deep rooted relation of cross matching modalities features, further exploration needs to be addressed by designing an intelligent matcher framework at the matching level as well. To further enhance the robustness of the proposed approach, we intend to investigate geometric consistency for