High Security Finger Vein Recognition Based on Robust Keypoint Correspondence Clustering

Finger vein recognition has been proven to be an effective pattern for personal versification in terms of its convenience and security. However, the existing works of finger vein recognition have neglected the application scenarios of finger vein recognition and treated the false acceptance rate (FAR) and the false rejection rate (FRR) equally, i.e., utilized the equal error rate (EER) as the main evaluation criterion. As structures hidden beneath the skin, the finger vein pattern is usually applied in access controls rather than forensics. Hence, the security requirement of finger vein recognition should be high, i.e., the FRR is assumed to be reduced under the premise of extremely low FAR. In our opinion, the important points and difficulties related to achieving high security recognition are enlarging the differences between genuine and imposter matchings. In this paper, a finger vein recognition framework based on robust keypoint correspondence clustering is proposed to achieve high security recognition. A scale-invariant feature transform (SIFT) descriptor-based method is utilized as the base recognizer. Then, a multi-input multi-output (MIMO) matching structure is designed according to different physical characteristics of the finger vein images to enhance the matching possibilities. After that, integrations of the matching pairs of each correspondence (i.e., matching of two images) are clustered according to the deformation information of each matching pair by a novel simulated clustering technique. Finally, the matching score is defined as the number of matching pairs after clustering. Extensive experiments on HKPU and FV-SDUMLA-HMT open databases demonstrate the superior performance of the proposed method, with the FRRs-at-0-FAR of 0.0139 and 0.2377, respectively, which imply the applicability of the proposed method in high security scenarios. The corresponding EERs are 0.0015 and 0.0139, and the rank-one recognition rates are 99.91% and 97.54%, respectively, which are comparable to the state-of-the-art methods and further indicate the effectiveness of the proposed method.


I. INTRODUCTION
With the increasing demand for information security, the selection of reliable and robust biometric traits for identity authentication has become a common concern in both scientific and industry communities [1], [2]. The finger vein, which is the tree-like structure hidden underneath the skin, was proven to be individually different and was applied in pattern recognition approximately twenty years ago [3]. The finger vein is characterized by its security against spoofing The associate editor coordinating the review of this manuscript and approving it for publication was Joewono Widjaja . attacks and convenience of conforming to human habits of using hands [4], [5]; therefore, it has attracted tremendous attention from researchers [6] and has been widely adopted in commercial applications [7], [8].
Extensive methods have been studied empirically to accomplish finger vein recognition in view of its distinct characteristics. However, the image capturing method of utilizing near infrared (NIR) rays may limit performance improvement [9]- [12]. More specifically, the quality of finger vein images usually varies in contrast and intensity distribution because of optical blurring or scattering [9], [10], and the images also suffer from nonnegligible deformations introduced by the noncontact capturing method and the nonrigidity of fingers [11], [12]. Many works have been conducted to address these issues, which can broadly be divided into four categories [13] according to feature extraction.
1) The local pattern-based method usually extracts pixel-level texture features from preliminary segmented regions of interest (ROIs) [14] and then attempts to identify a finger through pixel-to-pixel matching [15], [16]. Classical local pattern-based methods include local binary patterns (LBPs) [15] and its variants, local line binary patterns (LLBPs) [16], local directional codes (LDCs) [17], etc. The pixelwise local features are mostly fine in granularity and thus are dense and weak in discriminability. In addition, pixel-topixel matching needs careful ROI segmentation because of its sensitivity to deformations [11]. Improved techniques of feature selection and matching are designed to address these defects [18]. Some of the methods even employ machine learning algorithms to conduct feature transformation and selection [19], [20].
2) The keypoint-based method extracts features from interest points and searches the best matches in a one-to-many strategy. Existing keypoints can be subdivided into (a) physically comprehensible minutiae, such as cross and end points of vessels [13], [21], and (b) automatically extracted keypoints, such as extreme points of intensity or eigenvalues [10], [22]. Corresponding works include modified Hausdorff distance (MHD) with minutiae feature matching [21], singular value decomposition (SVD)-based minutiae matching (SVDMM) [13], methods based on the scale-invariant feature transform (SIFT) [10] and deformation-tolerant-based feature point matching (DT-FPM) [22]. The keypoint-based method is universally deemed deformation tolerant due to its sparsity of features and one-to-many matching strategy; however, it is not thoroughly explored because of the possible insufficiency of keypoint numbers and ambiguity of physical meanings.
3) The vein structure-based method, just as 'finger vein recognition' suggests, utilizes the intrinsic vessel structures to achieve recognition by first segmenting the vasculatures and then matching them according to the topology or local features. Traditional vein structure-based methods are mainly based on region growth [23], repeated line tracking (RLT) [24], mean curvatures (MeanC) [25], maximum curvature points (MCP) [26], Gabor filters [27], etc. Vessel segmentation is crucial for subsequent feature extraction or matching, and further improvement is necessary due to the effect of the image quality. Moreover, these methods are also sensitive to deformations, which can result in performance reduction. To address these problems, many vessel segmentation and matching techniques have been proposed [28], which involve convolutional neural network (CNN)-based methods [29], [30].
4) The learning-based method employs machine learning algorithms to obtain transformation matrices or classification models for recognition. Typical methods are based on principal component analysis (PCA) [31] or its variants [32], support vector machines (SVMs) [33], etc. Recently, CNNs have been introduced to finger vein recognition but are mostly utilized to segment vessels [29], [30]. Learning-based methods may be inadequate in taking advantage of topological information and may be time-consuming. Moreover, additional training sets are required, which may not always be available in real applications and are device-dependent.
The existing methods have greatly improved the performance of finger vein recognition. However, state-of-the-art methods have neglected the application scenarios of finger vein recognition because they have different requirements. The conventional application scenarios of biometrics are high security systems and forensics. In high security systems, such as access control or identity verification, intruders are forbidden to maintain safety; one can be validated if and only if a corresponding registration exists, which means false rejections are allowed while false acceptances are forbidden. In forensics, any suspects are assumed to be selected out, which means false acceptances are allowed while false rejections are injurious. Most of the existing methods treat the false rejection rate (FRR) and the false acceptance rate (FAR) equally, i.e., utilize the equal error rate (EER) as the main evaluation criterion. In fact, as structures hidden underneath the skin, the finger vein pattern can hardly be utilized as a fingerprint for criminal investigation or deoxyribonucleic acid (DNA) for genetic relationship identification. Thus, the finger vein pattern is mostly utilized in high security applications rather than forensics, and the FRR at an extremely low FAR (such as FRR-at-0-FAR) is the principal evaluation criterion.
The essential goal of achieving high security finger vein recognition is to enlarge the interval of genuine and imposter matching scores. In our opinion, it can be further obtained by updating the matching scores according to additional features for one correspondence, i.e., one matching between two images. ? deformation ???? In this paper, a finger vein recognition framework based on robust keypoint correspondence clustering is designed to achieve high security performance. In the proposed framework, a keypoint-based method based on the scale-invariant feature transform (SIFT) descriptor is utilized as the base recognizer. Then, according to different image features, a multi-input multi-output (MIMO) fusion strategy is designed to assemble various matching possibilities, eliminate the influences of image quality and enhance the physical meanings of keypoints. After that, the matching pairs for each correspondence are characterized by additional features of deformation information for further description. Finally, the matching pairs of each correspondence are clustered using a simulated clustering algorithm according to the additional features to eliminate false matching pairs. The number of final matching pairs is defined as the matching score. Extensive experiments have been conducted on two publicly available databases, HKPU and FV-SDUMLA-HMT, to verify the performance of the proposed method. The FRRs-at-0-FAR on the two databases are 0.0139 and 0.2366, which demonstrate the good performance of the proposed method in high security applications. The frequently adopted evaluation criteria of EER and rank-one recognition rate are also tested, which are 0.0015 and 0.0139 and 99.91% and 97.54%, respectively, and are greatly reduced simultaneously compared to the state-of-the-art methods.
The contributions of this work are summarized as follows: (1) We design a finger vein recognition framework for high security scenarios based on simulated clustering of matching pairs in each correspondence, which greatly reduces the main criterion of FRRs-at-0-FAR with the EERs and rank-one recognition rates greatly optimized simultaneously on the two experimental databases. (2) A MIMO fusion strategy is presented, which enhances the physical meanings of the extracted keypoints with vessel possibility matrices, curvature maps and stretched intensities. The keypoints are diverse and complementary among the groups with the keypoint numbers improved. (3) Additional features are extracted to represent the matching correspondences, i.e., the matched keypoint pairings. These features are in accordance with the image-to-image deformations and hence can differentiate true matching pairs from false pairs. (4) A novel technique to simulate one-class clustering is proposed to further eliminate false matching pairs; hence, the margin of the genuine and imposter matching is enlarged.
The rest of this paper is organized as follows. Section II thoroughly analyzes the characteristics of the keypoint-based method through comparisons with the existing methods. The ideas of the proposed MIMO strategy and simulated one-class clustering are also presented. Section III introduces the details of the proposed method. Section IV reports on the empirical study. Section V discusses some observations and concludes this paper.

II. RELATED WORK ANALYSIS AND THE CONCEPTION OF HIGH SECURITY RECOGNITION
In this section, our reason for utilizing the keypoint-based method is first introduced by analyzing the characteristics of the state-of-the-art methods. Then, the defects of the keypoint-based method and the improvement strategy are presented. After that, the problem of high security recognition is stated, and the rationale for the proposed simulated one-class clustering is presented.

A. RELATED WORK ANALYSIS 1) THE REASON FOR UTILIZING THE KEYPOINT-BASED METHOD
In this part, the existing methods are analyzed and compared at the feature level. As we stated before, finger vein meth-ods can be divided into local pattern-based, keypoint-based, vessel-based and automatically learned methods according to the extracted features. Among the four categories, we believe the keypoint-based features are advantageous over other features in terms of the following aspects: discriminability, repeatability, accuracy, quantity, granularity and efficiency, as stated in Table 1. In the table, discriminability refers to the uniqueness and variety of the features, and the features should be distinctive and scored differently with various features. Repeatability indicates the invariance and robustness of the extracted features against deformations and noise. Even when in different views, the features should be detected. Accuracy indicates that the feature can be precisely localized despite the influence of scales. Quantity means the number of features, which should be enough to represent each finger vein image. Granularity implies the scales of the features. The efficiency denotes the time efficiency of feature extraction and matching. The finger vein recognition methods are analyzed and compared according to these six aspects.
The local pattern-based method is also called the ROI-based method because it extracts pixelwise features (such as gradients) in preliminarily segmented ROIs. Hence, a feature matrix is generated in accordance with pixel positions. Consequently, two images are matched basically according to the feature matrices in a pixel-to-pixel manner. One can determine that the local pattern-based features are dense with fine granularity; hence, the quantity of the features is large, and the discriminability of these features is relatively unsatisfactory. However, the efficiency of the extraction and matching of the features is always high. Moreover, influenced by the deformations of fingers, the repeatability of features extracted from each pixel may be questionable in terms of robustness and invariability. Similar features can appear in different positions, so the accuracy of the features may also be unsatisfactory.
Vessel-based features are generally defined according to preliminarily segmented vessel structures. Finger veins are the main structures in finger vein images; thus, using vessel structures to conduct finger vein recognition is the most intrinsic method and is preferred in industrial applications. Vessel structures are stable structures under deformations in a certain interval and are distinctive for recognition, but they may be influenced by large deformations or image quality problems. Moreover, the vessels are always similar in structure and topology. Thus, the repeatability and accuracy of vessel-based features are sometimes questionable. The segmentation of vessels is usually important for subsequent feature extraction and matching and is frequently researched. The density of vessels is crucial, and the vein-based method is sometimes time-consuming, especially when involving a complex segmentation procedure.
Automatically learned features refer to features learned or selected through machine learning algorithms. Methods based on learned features are usually high in time consumption because of model training and testing. Moreover, the features are always pixel-or patch-based, so the discriminability of the learned features is highly based on the base features and greatly improved by the learning and testing procedures of machine learning algorithms. However, the trained models may overfit the involved datasets. The models are also sensitive to deformations because positional information is hard to encode in the model.
Keypoint extraction usually consists of two stages, i.e., keypoint detection and descriptor designation. The keypoints should be first localized according to stable position markers, i.e., extreme points of intensity or key positions on vessel structures. Then, descriptors are assigned to each key position to represent this keypoint. After that, the keypoints are matched in a one-to-many procedure. Since the extracted keypoints are relatively small in quantity and comprehensible in physical meanings, keypoint-based methods are relatively time-consuming due to complex extraction and subsequent matching. Nevertheless, the descriptors are coarse in granularity and usually invariant to translations and rotations, so keypoints are relatively higher in discriminability and more accurate in positions.
From the above analysis, we can determine that the keypoints are superior according to the criteria listed in Table 1.
The keypoint-based method shows high discriminability and is flexible to deformations. Moreover, it is usually efficient with a fair number of keypoints, which also saves considerable storage space. According to the comparisons, we surmise that the keypoint-based method warrants further study. It is noteworthy that early minutiae are dependent on segmented vessels, which are sensitive to image quality and always problematic; therefore, we concentrate on automatic keypointbased techniques. The SIFT descriptor is adopted in this paper because it is universally utilized in object recognition and can be easily compared by researchers.

2) PROBLEMS OF THE KEYPOINT-BASED METHOD
Despite all the advantages of the keypoint-based method, its performance is still far from perfect. Through comprehensive analysis, some of the shortcomings are concluded in the following three aspects.
First, keypoint localization is the preliminary work of keypoint extraction. However, the importance of keypoint detection is always neglected. Influenced by the obscurity, overexposure and low contrast of finger vein images, keypoint localization becomes difficult. Keypoints, such as extreme points and cross and end points on vessels, are often scarce with unstable positions. Moreover, the locations of keypoints should be physically meaningful, while random noise may introduce uncertainties to keypoint localization.
Second, the keypoints are characterized by features extracted around the keypoints called keypoint descriptors. The distinctiveness of descriptors greatly depends on the image qualities and the stability of keypoint positions. For example, most of the descriptors are extracted in terms of image gradients and applied to natural images. However, the contrast of finger vein images is inferior, which results in unacceptable feature discriminability.
Third, keypoint matching is the final procedure in the kernel algorithm of finger vein recognition. There are always false keypoint pairings when the distinctness of the keypoint descriptors is unsatisfactory. Moreover, the one-to-many keypoint pairing strategy of the existing methods may result in one keypoint erroneously being matched to multiple keypoints, while false pairing removal can also disturb the true pairings. In addition, the positions and matching relationships of the matched keypoints are informative but are not effectively utilized in finger vein recognition.
In summary, the keypoints are generally considered insufficient in distinctiveness and quantity, as well as ambiguous in physical meanings; therefore, they are not thoroughly studied. In this work, an MIMO strategy is designed to address these problems. In the MIMO strategy, inputs of curvature maps, vessel possibilities and normalized intensities are selected to represent the finger vein images. Through these processes, the extracted keypoints are improved in variety and physical meaning, and the distinctiveness of the keypoints is also enhanced. Moreover, the displacements of matching pairs form a dataset for image-level matching, which we defined as the matching correspondence. As stated before, the matching correspondences include many false pairings, which can be further eliminated according to image-to-image deformation features. Correspondence-level false pairing elimination is deemed the crucial part for high security finger vein recognition, which will be further analyzed in the following section.

B. HIGH SECURITY RECOGNITION ANALYSIS
As stated in Section I, finger veins are hidden beneath the skin and cannot leave traces in daily life, so they are mainly utilized in high security applications rather than forensics. On this basis, false acceptances should be restrictively controlled or forbidden. In our opinion, the important point of reducing false acceptances is enlarging the margins between genuine and imposter matchings. Based on the keypointbased method, false matching pairs should be eliminated, especially false acceptance matching pairs in genuine matching. To achieve this, the matching pairs of each image correspondence are supposed to be clustered to eliminate false matching pairs. In the following section, the features utilized for matching pair description and the simulated one-class clustering algorithm are introduced.

1) EFFECTIVENESS OF DEFORMATION FEATURES
Generally, the number of successful matching pairs is defined as the matching score. To further eliminate false acceptances VOLUME 9, 2021 for high security recognition, image-to-image matching pairs should be described using additional features. The matching pairs in terms of two images are in accordance with the imageto-image deformations, i.e., the positional changes of two finger postures, which are regular in orientations and distances. Hence, the orientations and the displacement distances are extracted to describe the matching pairs. As reported in the literature [12], image-to-image positional changes are discriminative for genuine and imposter differentiations. In this paper, this information is extracted at the keypoint level, which is more effective because of deformation-tolerant oneto-many keypoint matching.

2) SIMULATED ONE-CLASS CLUSTERING
Differentiation of the true matching pairs from the false pairs is a typical binary classification problem. However, we do not treat it as a classification problem due to the following aspects. (1) The first is the class imbalance problem, in which, among the matching pairs, the number of true matching pairs is much larger than the false pairs, which may introduce uncertainties in classification. (2) The second is the variety of deformations, in which the matching pairs are different in orientations and distances among different image-to-image correspondences, which can also affect the classification. Similarly, traditional clustering is also inapplicable for matching pair classification on the whole dataset due to the differences between image correspondences, i.e., different image-to-image postures. For the differentiation of image-level matching, the class imbalance problem still exists for the true and false matching pairs in genuine matching, as well as for the false and seemingly true matching pairs in imposter matching. Moreover, classification or clustering must originate and be learned for each image-level matching.
As analyzed above, the matching pairs between one imageto-image correspondence in genuine matching are in accordance with the corresponding posture changes, while the matching pairs in imposter matchings are irregular and small in amount. Hence, to differentiate the true matching pairs from the false pairs, simulated one-class clustering is proposed at the correspondence level. In genuine matching, each true matching pair is characterized by similar positional changes between two images, which forms one specific class with irregularly spread false matching pairs. Thus, in the simulated one-class clustering, the consecutive values of each positional feature are assessed. The algorithm is also effective for imposter matching with false matching pairs because their feature values are scarcely consecutive. Through the proposed one-class clustering, false matching pairs, especially false acceptances, can be eliminated, which contributes to high security finger vein recognition.

III. PROPOSED FRAMEWORK
In this section, we first provide an overview of the proposed method and then describe each of the stages in detail.

A. METHOD OVERVIEW
The flowchart of the proposed method is shown in Figure 1, which mainly consists of three components. First, the MIMO matching strategy is based on the SIFT descriptor, which includes images with intensity enhancement, curvature map calculation and vessel possibility assessment. Second, the positional features extracted for matching pairs are indications, which include orientations and displacement distances. Third, simulated one-class clustering was used for false matching pair removal. The number of remaining matching pairs is defined as the final matching score.

B. THE MIMO MATCHING STRATEGY
The multi-input multi-output (MIMO) strategy is proposed to boost the distinctiveness and quantity of extracted keypoints and image-to-image matching pairs. The keypoint-based SIFT descriptor is utilized as the base recognizer. The inputs are enhanced intensities, curvature maps and vessel possibilities. The multiple matching pairs for each image correspondence are fused to represent each genuine or imposter matching.

1) INTENSITY ENHANCEMENT
The first image processing procedure is intensity enhancement. As the image is always obscure and low in contrast, intensity unevenness correction [40] is adopted to improve the intensity distribution.
In this method, the additive image bias is estimated by convoluting the image by Gaussian filters with a standard deviation sigma = 3 and size of 30 × 30. Then, we remove the assessed image bias from the original image and enhance the generated image by histogram normalization. Finally, the resulting image is smoothed by using a medial filter with a size of 3 × 3 to remove random noise.

2) CURVATURE MAP ACQUIREMENT
In the image, the max curvature for one pixel corresponds to the orientation and magnitude of the max eigenvalue at this pixel. In this paper, we acquire the magnitude of the maximum eigenvalue to describe finger vein images. To achieve this information, we first calculate the Hessian matrix of each pixel, which can be denoted as Equation (1): where I (x, y) is the pixel value of image I at position (x, y). Then, we diagonalize the Hessian matrix as follows: where the diagonalized Hessian matrix has two eigenvalues λ 1 and λ 2 . Among the two eigenvalues, the larger eigenvalue is generally considered to correspond to the maximum curvature of the neighboring pixels, while the feature vector in terms of the smaller eigenvalue corresponds to the perpendicular direction. Accordingly, we take the larger eigenvalue for each pixel and denote it as the corresponding curvature map, which can be defined as: 3

) VESSEL POSSIBILITY OBTAINMENT
The finger veins can be considered dark elongated structures on bright backgrounds and vary in diameter and orientation. Hence, in our method, we obtain the vessel possibility by using the multiscale multidirectional deviation of Gaussian (MMSDG) [41]. We take the filter in the horizontal direction as a base filter: Here, the size of the filter is set to (6σ + 1) × (6σ + 1), with σ values of 1, 2 and 3. Consequently, the steps of the MMSDG are as follows: (1) rotating the base filters into six directions 0 • , 30 • , 60 • , 90 • , 120 • and 150 • using bilinear interpolation; (2) filtering the finger vein image by convoluting it with the generated multidirectional filters and then fusing the results by selecting the maximum intensity value; (3) generating images for different filter scales and then averaging the images, and finally, (4) normalizing the intensity of the resulting image to 0∼255 and then enhancing the image by histogram equalization.
The three processing operations are employed to enhance the image details, offer different perspectives and provide the following keypoint varieties. The three processing procedures are related to image enhancement, curvatures and vessel possibilities, which are different from and complementary to each other.

4) SIFT DESCRIPTOR
The scale-invariant feature transform (SIFT) descriptor is adopted as the benchmark descriptor to present finger vein images because it is frequently utilized in finger vein recognition and natural object registration and demonstrates good performance.
The SIFT descriptor, which consists of both feature extraction and saliency detection, is a kind of sparse descriptor in characterizing local gradient information. The principle of SIFT is finding extreme points in scale space and filtering them to obtain stable points. To generate a local descriptor, local features around these points are extracted. The feature extraction procedure is as follows: (1) scale-space extrema detection implemented using difference-of-Gaussian (DOG); (2) selection of stable keypoints from extrema points; (3) orientation assignment to each keypoint; and (4) generation of keypoint descriptors.
With stable features extracted using SIFT, the influence of rotations and translations can be partially eliminated. The SIFT descriptor extraction and matching utilized in this paper are implemented based on the VLFeat open source library version 0.9.9. To maintain generality, all the parameters are set as their default values.

5) SIFT MATCHING
The original SIFT matching pairs the two descriptors according to the squared Euclidean distance. A descriptor f 1 is matched to descriptor f 2 only if the distance d(f 1 , f 2 ) multiplied by thresh θ is not greater than the distance of f 1 to all other descriptors.
This matching strategy may be problematic because for images A and B, keypoint a in image A can be successfully matched with more than one keypoint; moreover, original matching is unilateral, which can result in keypoint b in image B being the best match for keypoint a but the converse is false.
Based on the above consideration, we match the two images A and B bilaterally. All the matching pairs are recorded, and the duplicate matching pairs are removed. Through the matching procedure, all the matching possibilities are stored, which improves the varieties of matching pairs.

C. DEFORMATION-BASED FEATURE EXTRACTION
In this section, positional features for matching pair presentation are described in detail. Generally, there are many false matching pairs in both genuine and imposter matchings, which can affect the recognition performance. The positional information of the matching pairs is extracted to represent the matching pairs and remove false matching pairs.
Let a and b be the two images to be matched, S denotes the keypoint matching pairs, and n presents the number of matching pairs. For every matching pair, four correspondence features are extracted: displacements in the horizontal and vertical directions d x and d y , displacement distances d and angles θ . The four features can be calculated as follows: where S a (i) and S b (i) represent the keypoints from images a and b in the ith matching pair S(i), respectively. S a x (i) and S a y (i) denote the horizontal and vertical coordinates of keypoint S a (i), respectively. d(i) is the displacement distance of the ith matching pair with d x (i) and d y (i) representing the displacements in the horizontal and vertical directions, respectively. θ (i) denotes the displacement angle of the matching pair S(i). In Equations (7) and (8), the horizontal displacement d x (i) is set as d x (i) + w (here, w is the width of finger vein images). The reason for this setting is to ensure that the distances and angles calculated are stable. Original coordinates can result in indistinctive d and θ because the deformations can affect the calculation, and the distances and angles can spread in all directions in a relatively large interval.

D. SIMULATED ONE-CLASS CLUSTERING
It can be determined that the positional information can be utilized in false matching pair removal. The three displacement distances and rotation angles tend to be the same in true matching pairs because all the keypoints have the same deformations. Through this observation, the keypoint match-ing pairs can be clustered, and the false matching pairs are mostly outlier keypoints and can be removed. Because false pairs are different from case to case, multiclass clustering algorithms may not be effective. In this part, we design a simulated clustering algorithm for matching correspondences, which can be seen in Algorithm 1. This algorithm is meant to select the matching pairs with similar deformations, and it performs similar to one-class clustering; thus, matching pairs with similar positional features remain.
The algorithm consists of the following steps. First, for each feature in the feature set F, hash the feature value into histograms. Second, calculate the nonzero maximum consecutive number in each histogram. Then, delete the matching pairs that do not correspond to the consecutive matching numbers. The matching pair number of the final F is denoted as the matching score. More details can be seen in Algorithm 1. It can be determined that the only parameter bin number m in our algorithm is assigned automatically according to the range of feature values, which is robust and adaptive to new conditions.

A. EXPERIMENTAL MATERIALS
The experiments are conducted on two publicly available databases. The first database is from Hong Kong Polytechnic University and is cited as the HKPU database [27]. The second is from the MLA Lab of Shandong University. This finger vein database is a subset of the homologous multimodal traits (SDUMLA-HMT) database [34] and referenced as the FV-SDUMLA-HMT database.
The images in the HKPU database [27] vary in intraclass deformations due to its uncontrolled image acquisition. The images are captured during two separate sessions. Because not all the volunteers are presented in the second session, only the first-session images are usually adopted in experiments. As a consequence, the dataset used in our experiments includes 1,872 (156×2×6) images, which are captured from the index and middle fingers of the left hand of 156 volunteers, with 6 images of each finger. The images are preprocessed using the method in the literature [12], and then all the images are resized to 96 × 64 with the intensities normalized to 0∼255, which can be roughly seen in Figure 2.
The images from the FV-SDUMLA-HMT database are more complex in deformations and quality. The images were captured from 106 volunteers, and each subject contributed the index, middle and ring fingers of both hands, with 6 images of each finger. Thus, the second database includes 3,816 (106 × 6 × 6) images. All the images are preprocessed by the techniques described in the literature [17]. The preprocessed images are also resized to a scale of 96 × 64 with intensities normalized to 0∼255, as shown in Figure 2.

B. ANALYSIS OF THE PROPOSED METHOD
In this section, we test the proposed method in the verification and identification modes on both the HKPU and FV-SDUMLA-HMT databases.
In the verification mode, the performance of the proposed method is evaluated through full matching with evaluation criteria of the equal error rate (EER), false accept rate (FAR) at zero false reject rate (FRR) (FAR-at-0-FRR) and FRR at zero FAR (FRR-at-0-FAR) and receiver operating characteristic (ROC) curves according to the matching score distributions. Consequently, there are 312 × C 2 6 genuine matchings and 312 × 6 × 311 × 6 imposter matchings on the HKPU database and 636×C 2 6 genuine matchings and 636×6×635× 6 imposter matchings on the FV-SDUMLA-HMT database. In the literature, the FRR-at-0-FAR is generally emphasized in high security applications such as entrance guard systems for banks and military installations. In these scenarios, the FAR is expected to be as low as possible, so the FRR-at-0-FAR is always tested. The FAR-at-0-FRR is often referred to in forensic applications when possible suspects are expected to be detected. Therefore, the FRR should be as small as possible, and the FAR-at-0-FRR is generally calculated.
In the identification mode, the proposed method is evaluated to simulate real identity recognition to find the class to which the object finger belongs. In this experiment, each finger vein image is utilized as the probe, and then one template is randomly selected from each class for identification. Then, there are correspondingly 312 × 6 probes, 312 matchings for each probing process, and a total of 312 × 6 × 312 matchings on the HKPU database. There are 636 × 6 probes, 636 matchings for each probing process, and a total of 636 × 6 × 636 matchings on the FV-SDUMLA-HMT database. In this experiment, the matching scores for each probe are sorted, and the rank of the genuine matching score for each probe is calculated. The genuine matching score for each probe is expected to be the highest, which is defined as rankone. The average rank-one recognition rates of ten repeated experiments are provided, and the cumulative match curves that depict the genuine ranks of the probes are illustrated to show the performance. The performance of the proposed method is listed in Table 2. The EERs, FARs-at-0-FRR, FRRs-at-0-FAR in verification mode and the average rank-one recognition rates in identification mode are tabulated. The ROC curves and cumulative matching curves are shown in Figure 3. From Table 2 and Figure 3(a), we can determine that the EERs on the HKPU and FV-SDUMLA-HMT databases are 0.0015 and 0.0139, respectively, which are promising. The FRR-at-0-FAR and FAR-at-0-FRR on the HKPU database are 0.0139 and 0.0491, respectively, which illustrate the potential of the proposed method in real applications. These two criteria are not satisfactory on the FV-SDUMLA-HMT database with an FRR-at-0-FAR of 0.2377 and FAR-at-0-FRR of 1, which is mainly because of the poor image quality, and the difference between the genuine and imposter images is not evident. The average recognition rates are 0.9991 and 0.9754 on the two databases, with variances of ±3.8174e-07 and ±4.2025e-06, respectively, which further implies the applicability of the proposed method in real scenarios. It has the same trend in the identification mode, and the rank-one recognition rate on the FV-SDUMLA-HMT database is relatively lower than that on the HKPU database due to image quality problems.
In the literature, the experiments are always compared in the verification mode. In real applications, high security is emphasized, so FRR-at-0-FAR is generally needed. Therefore, in the following comparable experiments, both the EER and FRR-at-o-FAR in verification mode are tabulated.
In this experiment, we also analyze the time efficiency of the proposed method. The experiments are implemented using the MATLAB platform and conducted on a personal    Table 3. Here, Fea1, Fea2 and Fea3 denote the different image preprocessing operations of intensity enhancement, curvature map extraction and vessel possibility assessment, respectively. The time consumption of the preprocessing, feature extraction and matching are approximately 50.4073 ms, 17.3866 ms and 1.1941 ms, respectively, for matching, which is 68.9880 ms in total and is suitable for real scenarios. Moreover, the time consumption can be further reduced by conducting offline preprocessing and redesigning and compiling on other efficient programming platforms.

C. ANALYSIS OF EACH COMPONENT
In this section, we analyze the necessity of each component of the proposed method. The experiments are conducted in verification mode on the two databases in terms of EER and FRR-at-0-FAR. In these experiments, when testing one part of the proposed method, the other components remain unchanged. The preprocessing, necessity of multiple features, and effectiveness of the simulated clustering are hence validated, and the results are listed in Table 4 and illustrated in Figures 4 and 5.
From Table 4, Figure 4(a) and 5(a), we can determine that the EERs and FRRs-at-0-FAR are 0.0546 and 0.7686 on the HKPU database, respectively, and 0.0964 and 0.5626 on the FV-SDUMLA-HMT database, respectively, without adopting intensity enhancement, curvature and vessel extraction. Com- pared with the proposed method with EERs and FRRs-at-0-FAR of 0.0015 and 0.0139 on the HKPU database, respectively, and 0.0139 and 0.2377 on the FV-SDUMLA-HMT database, respectively, it can be seen that the diversity of image content and stability of the keypoints are important.
We then analyze the necessity of each preprocessing step before keypoint localization. Here, in Table 4, Figures 4(b)(c) and 5(b)(c), Fea1, Fea2 and Fea3 denote the results after intensity enhancement, curvature map extraction and vessel possibility assessment, respectively. Feaa&b denotes that both processing a and b are undertaken. From the table, we can see that the preprocessing is effective, the EERs and FRRs-at-0-FAR are 0.0088, 0.0043 and 0.0033, 0.0996, 0.0880, and 0.0438 on the HKPU database and 0.0316, 0.0287 and 0.0189, 0.2612, 0.3303 and 0.3210, respectively, on the FV-SDUMLA-HMT database. Among the processes, Fea3 of vessel possibility performs the best, possibly because Fea3 involves vessel information, which is the main structure in finger vein images. However, compared with the proposed method, it still needs further improvement. The combination of Fea1&2, Fea1&3, and Fea2&3 generally improves the performance in terms of EERs and FRRs-at-0-FAR. The best   EERs are 0.0021 and 0.0165 achieved with Fea1&2 on the HKPU database and Fea2&3 on the FV-SDUMLA-HMT database. However, Fea1&3 and Fea2&3 are inferior compared with single preprocessing with EERs of 0.0105 and 0.0056 on the HKPU database, which may be caused by the noncomplementarity of these two processing methods. Moreover, none of these combinations surpasses the performance of the proposed method combining all three preprocessing steps in terms of EER.
We also compare the simulated clustering with average fusion and state-of-the-art clustering, as shown in Figure 4(a)(d) and 5(a)(d). In this operation, the matching pairs are meant to be homogeneous in displacements with a reduced influence of erroneous matchings. The most intuitive strategy to achieve this objective is fusion; hence, we conduct average fusion with the genuine and imposter matching scores of the three preprocessing steps averaged after the criterion calculation. The EERs and FRRs-at-0-FAR were 0.0042 and 0.0295 on the HKPU database, respectively, and 0.0260 and 0.3046 on the FV-SDUMLA-HMT database, respectively. We can determine that the average fusion involves attaining an average performance of the three processing steps, which is inferior to the proposed method.
The reason is probably that the fusion strategy takes the matching pairs indiscriminately; thus, the mismatched pairs cannot be selected and eliminated. Another technique to differentiate the matching pairs is clustering, since we cannot obtain the labels preliminarily, The selected clustering methods are density-based spatial clustering of applications with noise (DBSCAN), k-means and hierarchical clustering (Hi-clustering). In these clustering algorithms, the clusters are set as 2, and the sample number of the larger cluster is set as the final matching score. From Table 4, we can determine that the best EERs are acquired from Hi-clustering on the two databases. Hi-clustering considers the distributions of the samples, and the density is proven to be even. The FRRat-0-FAR of the Hi-clustering on the FV-SDUMLA-HMT database is superior to DBSCAN. It is mainly because there are image quality problems, and the distinctiveness of the keypoints is not prominent; thus the density-based DBSCAN algorithm performs the best. Nevertheless, most of the clustering algorithms are not comparable to the proposed method. Based on the above analysis, we can determine that the three operations in the proposed framework are all essential for recognition. Each of the components improves the performance and finally reaches relatively low EERs and FRRs-ato-FAR, which demonstrates the advantages of the proposed method and its applicability in commercial products.

D. COMPARISON WITH KEYPOINT-BASED METHODS
We also compare the proposed method with typical existing keypoint-based methods on the two public databases, as shown in Table 5 and Figure 6. The methods are evaluated in verification mode using EER and FRR-at-0-FAR. As not all of the methods are evaluated on the two VOLUME 9, 2021 databases, we implement these methods according to the original paper. The compared methods mostly use automatically extracted keypoints. The work of Pang et al. [35] adopts the SIFT descriptor as keypoints, but the performance is not acceptable with an EER of 0.1081 and an FRRat-0-FAR of 0.9274. The methods of Kim et al. [10] and Liu et al. [36] also adopt the SIFT descriptor as the keypoint with the preprocessing of illumination normalization and vessel segmentation. We can see that through these strategies, the performance of SIFT-based recognition is improved, with EERs of 0.0105 and 0.0235 on the HKPU database and 0.0532 and 0.0624 on the FV-SDUMLA-HMT database. The preprocessing is effective because it can stabilize the extracted keypoints and enhance the features. Our previous work of illumination inhomogeneity removal and matching pair reconsideration [40] also achieves good performance with EERs of 0.0114 and 0.0585 on the two databases. The method of Matsuda et al. [22] utilizes the Hessian matrix to detect keypoints and extract descriptors. This method is claimed to be deformation tolerant, and the EERs are 0.0379 and 0.0715 on the two databases.
From the comparison, we can determine that the preprocessing of images can boost the recognition performance. The methods compared are mostly based on automatically extracted keypoints because their performance is generally superior compared to the minutiae-based method; for example, the method of Liu et al. [13] achieves an EER of 0.0501 on the HKPU database. We can also determine that among all the compared methods, the SIFT descriptors are usually utilized as the benchmark descriptors and achieve good performance. Among all the compared methods, the proposed method achieves the best performance, which further demonstrates the effectiveness of preprocessing and utilization of matching information.
From the results demonstrated in Table 6, we can determine that the proposed method achieves the lowest EER on the HKPU database. The traditional pattern-based methods are problematic due to the deformation problem, and the EERs of LBP, LLBP and LDC are 0.0833, 0.0937 and 0.0618, respectively. Through feature selection of careful designation or machine learning, the EERs are reduced; for example, the EERs of the PBBM, DBC and DBD methods are 0.0278, 0.0144 and 0.0055, respectively. In these methods, the DBD method achieves the lowest EER, but it may be limited by the learning strategy. The training and testing sets are homogeneous, which may exhibit overfitting. Moreover, these methods may not be effective in multiterminal scenarios. The keypoint-based methods are compared in Section IV-D; here, we compare with one typical minutiae-based method and one automatic keypoint-based method. From the tabulated performance, we can determine that the existing keypoint-based methods are not thoroughly researched, and the performance is generally inferior to the existing methods. After deep analysis, we believe that keypoint-based methods are advantageous and propose a method based on matching correspondences. The performance is greatly improved, even among the improved version of each category of recognition methods. Among the vessel-based methods, the ASAVE method performs best with an EER of 0.0291 on the HKPU database. This method is a framework that ensembles the information between main vessels and branch vessels. The learning-based methods compared here are mostly deep learning-based methods. The main procedure of most deep learning-based methods is vessel segmentation. The best performance of the deep learning-based methods is 0.0033 from the work based on DenseNet-161. Nevertheless, the performance is superior to the proposed method.
On the FV-SDUMLA-HMT database, the results demonstrated the same trend. Only the DBD method and FV-GAN method surpass the proposed method. These two methods are learning-based and need careful training since they have a small amount of testing samples and are easily prone to overfitting.
Based on the abovementioned analysis, the proposed method is superior to most of the existing methods. Although some of the learning-based methods demonstrate better performance on the FV-SDUMLA-HMT database, these methods may suffer from overfitting problems and be datadependent. Moreover, the image quality is relatively poor of the FV-SDUMLA-HMT database, which is also a limitation for keypoint-based methods.

V. CONCLUSION
In this work, we design a finger vein recognition method based on robust keypoint correspondence clustering to achieve acceptable high security performances. Compared to traditional finger vein recognition methods, the proposed method focuses on FRRs at low FARs instead of solely reducing EERs. The proposed method consists of four main stages: image enhancement, base matching, correspondence feature extraction, and clustering. In image enhancement, the enhanced intensity, curvature maps and vessel possibilities are all adopted to involve different and complementary information. The experimental results in Table 4 demonstrate the effectiveness of this procedure. Both the EERs and the FRRs-at-0-FAR, which are 0.0015 and 0.0139 and 0.0139 and 0.2377, respectively, are mostly the lowest among the comparative studies using less than three of the enhancement techniques. The base matching method utilized in this paper is based on the SIFT descriptors for its superiority of rotation-invariant and universal applications. Existing SIFT-based methods are compared in Table 5 to show the superiority of the proposed method, and the EERs and FRRs-at-0-FAR are greatly reduced. Clustering with designed deformation features is meant to recognize correctly matched cases and eliminate incorrect cases in terms of the regularity of correspondences among genuine matchings. A simulated clustering algorithm is introduced to solve this problem. We also compared the performance with the popular clustering algorithms and average fusion, and the proposed method is further proven to be effective, which can also be determined in Table 4. In addition to the component analysis, the proposed method is also compared to the existing benchmark methods, including the deep learning-based approaches, as shown in Table 6. From the tabulated EERs, we can determine the superiority of this work. The FRRs-at-0-FAR are not provided because most of the works ignore this criterion. However, the lower EERs are related to lower FRRS-at-0-FAR.
In this work, the correspondence is extracted from keypoints prematched by the SIFT descriptors. From this viewpoint, the keypoint similarity and their correspondence features are considered separately. In the future, optimization methods combining both keypoint similarity and deformation information will be researched. Keypoint extraction robust to deformations and image quality variations will also be explored. VOLUME 9, 2021 GUANG ZHANG received the M.Sc. degree from the School of Computer Science and Technology, Shandong University, where he is currently pursuing the Ph.D. degree with the School of Software Engineering. He is also a Director Physician with the First Affiliated Hospital of Shandong First Medical University. His research interests include machine learning, data mining, and medical image analysis.
XIANJING MENG received the Ph.D. degree from the School of Computer Science and Technology, Shandong University, in 2016. She is currently a Lecturer with the School of Computer Science and Technology, Shandong University of Finance and Economics. Her research interests include biometrics, medical image analysis, machine learning, and its applications.