Region of Interest Localization Methods for Publicly Available Palmprint Databases

So far, there exist many publicly available palmprint databases. However, not all of them have provided the corresponding region of interest (ROI) images. If everyone uses their own extracted ROI images for performance testing, the final accuracy is not strictly comparable. Since ROI localization is the critical stage of palmprint recognition. The location precision has a significant impact on the final recognition accuracy, especially in unconstrained scenarios. This problem has limited the applications of palmprint recognition. However, many currently published surveys only focus on feature extraction and classification methods. Throughout these years, many new ROI localization methods have been proposed. In this chapter, we will group the existing ROI localization methods into different categories, analyze their basic ideas, reproduce some of the codes, make comparisons of their performances, and provide further directions. We hope this could be a useful reference for further research.


Introduction
Palm-related biometrics can easily reach high accuracy due to two reasons. One is that palmprint contains plenty of features, such as principal lines, wrinkles, ridges and valleys, and minutiae points; another one is that the regions of interest (ROIs) could be aligned with the help of the finger valley points. Since the captured palms may have different rotations and scales, to obtain high accuracy, the extracted palmprint images should be aligned with each other. It means the palmprint region should be localized based on the relative coordinate system, which is established basing on the keypoints of the finger valleys. Most of the current palmprint recognition algorithms are based on the direction information of the palmprint lines and textures [1,2]. Hence, misalignment will significantly affect the final matching score. A robust and precise ROI localization method is essential for palmprint recognition, especially for touchless applications. Many organizations have collected their palmprint databases based on different research targets. More and more novel databases arise in recent years; some of them are captured across different devices, some with different illuminations, and some at different distances.
In the following section, we will review the current palmprint databases and ROI localization methods. Table 1 summarizes the current palmprint databases. Some basic information is compared. In Table 1, official ROI means whether the official ROI images are provided; localization code means whether the corresponding ROI extraction code is released. Some sample images of these databases are shown in Figure 1.

The Hong Kong Polytechnic University (PolyU) Palmprint Database
The PolyU Palmprint Database is the first publicly available palmprint database. It contains 7752 images captured from 386 hands in two sessions; around 10 images are collected for each palm in each session. The palmprint acquisition device is a contact-based device that consists of a high-quality industrial monochrome camera and a well-selected ring light source. The palm pose also is restricted by the pegs. So the captured images have high image qualities. However, the released image resolution, 384 Â 284 pixels, is relatively low.

PolyU Palmprint Multi-spectral Database
Authentication by just RGB or gray images may not be safe enough; attacks from fake palm images and videos can easily spoof the system. Hence, the multi-spectral based palmprint recognition starts to draw attention. Four spectrums, red, green, blue, and near-infrared (NIR), are utilized to establish the PolyU Multi-spectral Palmprint Database. A contact-based device is employed to capture images from 250 volunteers in two sessions. 24,000 images are captured from 500 palms. Each palm contributes six images in each session under each spectrum. Our observation shows that the images captured under blue light have the highest sharpness, while that captured under NIR light has the lowest.

CASIA Palmprint Image Database
In CASIA Palmprint Database, 5502 images are captured from 312 subjects, around 9 images for each palm. The authors also made their own image acquisition device, which has a big enclosure and a black backboard. During the capture process, the user puts his/her palm into the enclosure back on the unicolor board; the ambient light is blocked by the enclosure. Hence, the palm images are captured in an ideal environment, but the sharpness of the palm images is not very high. Besides, some palm images are captured with significant rotations, and some fingers have moved out of the imaging window. These factors make it difficult to localize the palmprint ROIs.

IIT Delhi (IITD) Touchless Palmprint Database version 1.0
The palm images in the IITD palmprint database are captured with large rotation variations. The touchless imaging setup consists of a big black box, a digital camera, and a circular fluorescent light source. It provides 2601 palm images collected from 470 hands, including 1301 left-palm images and 1300 right-palm images. The official ROI images have been normalized and thus show obvious principal lines and wrinkles.

PolyU-IITD Contactless Palmprint Images Database version 3.0
This database is collected from the volunteers in different countries, China and India, by a general-purpose handheld camera over the years. Totally 14,000 images are captured from 1400 palms. The characteristic of this database is that the images are collected across different locations, times, occupations, and age ranges. Both normal and abnormal hands are involved (as is shown in Figure 1(k)).

COEP Palmprint Database
The palm images in this database have high resolutions. They are captured by Canon PowerShot SX120 IS, the image resolution is 1600 Â 1200 pixels. According to the file's attribute information, the image reaches 180 dots per inch (DPI). Most of the ridge and valley lines could be seen clearly. During the image capture process, the palm position is restricted by five pegs, so the captured images are with low rotation degrees. It is a good dataset for studying the palmprint image sharpness. The downloaded dataset contains 1305 images pertaining to 167 palms, around 8 images per palm.

GPDS100Contactlesshands2Band Database
Both visible light camera and infrared (IR) light camera are adopted to collect 1000 visible light palm images and 1000 IR light palm images from 100 volunteers. Each palm contributes 10 visible light images and 10 IR light images. The user places his/her palm over the camera and touchlessly adjusts the position and pose of the hand in order to overlap with the hand mask drawn on the device screen. The image sharpness is not very high. However, it is a meaningful database because the images' qualities are more close to that captured in real-world applications.

KTU CVPR Lab. Contactless Palmprint Database
The author made a new device by a low-cost camera to capture palm images with a resolution of 768 Â 576 at 75 DPI. Totally 1752 images are collected from 145 palms, about 12 images for each palm. The images are captured under different ambient light intensities, backgrounds, finger postures (finger space and finger rings), and different hand distances, rotations, and translations. The image sharpness level of this database is not very high.

Tongji Palmprint Database
It was the biggest touchless palmprint database in 2017. The authors also made a novel palmprint acquisition equipment that consists of a digital camera, a ring light source, a screen, and a vertical enclosure. This device can capture both visible light palmprint images and infrared light palm vein images. During collection, the user's palm is put into the enclosure to avoid ambient light. At the same time, the upper screen will show the palm in real time, so that the user knows how to put his/her hand and when to stop and hold. Totally 12,000 images are captured from 600 hands in 2 sessions. For each palm, in each session, 10 palmprint images are collected.

Tongji Mobile Palmprint Dataset
The device used in Tongji Palmprint Database provides a stable environment for palmprint acquisition; this strategy can ensure the final recognition performance. However, the big enclosure also has limited its applications. So the Tongji group further collected another novel database by the widely used mobile phones. The palm images are captured in the natural indoor environment. Two mobile phones are used, including HUAWEI and Xiaomi. This dataset contains 16,000 palmprint images from 400 palms collected in 2 sessions. In each session, each mobile phone captures 10 images for each palm. All palm images are labeled, and corresponding codes are released on the author's homepage [23].

Nanyang Technological University (NTU) Palmprint databases version 1
The NTU palmprints from the Internet (NTU-PI-v1) database consists of 7781 hand images collected from the Internet. Hence, the palm images are captured in an uncontrolled and uncooperative environment. The images in it are collected from 2035 different palms of 1093 subjects with different ethnicity, sex, and age. Around four images are collected for each palm. It is the first large database established for studying palmprint recognition in the wild. But the image sharpness is relatively low compared with the normal palmprint images. The NTU Contactless Palmprint Database (NTU-CP-v1) contains 2478 palm images captured from 655 palms of 328 subjects using cameras of Canon EOS 500D or NIKON D70s, around four images for each palm. Currently, the samples for each category are relatively few compared with the other databases.

Related work on ROI localization
The most widely used ROI localization method is proposed in [4]. Its main idea is first detecting the keypoints of the finger valleys and then establishing a local coordinate system based on the detected keypoints, so that the ROI coordinates are determined based on the palm direction and position (as is shown in Figure 2). Most of the current ROI localization methods [10,[27][28][29][30][31][32] are based on this strategy.
The main problem of ROI localization is keypoint detection. There are two approaches to localize the landmarks: one is first segmenting the palm region and then searching for landmarks using the digital image processing techniques based on the detected edges; another is directly regressing the landmarks by utilizing both the hand shape and texture information.

Classical methods
One important goal of the first strategy mentioned above is simplifying the background. There exist three approaches: 1. Capture the palm with a unicolor backboard [4,6,8,12,21,33] 2. Employ an IR camera or a depth camera to capture an IR image or a depth image to assist palm segmentation [34,10,30,35] 3. Enhance the contrast of the foreground and background by setting a strong light source intensity and a short exposure time Their target is enhancing the contrast of the palm region and the background. For example, in [28], the mobile phone's built-in LED flash is utilized for palm segmentation. When the flash is turned on, the palm surface is much brighter than the background, because the palm is much closer to the camera than the background. The built-in auto-exposure control function of the image signal processor (ISP) on the camera chip will automatically decrease the exposure time to capture proper palm images; the palm region should fall into the proper grayscale range. As a result, the captured background is very dark.
After hardware and acquisition mode optimization, the palm region could be segmented by skin-color thresholding or the Otsu-based methods [30,31,36,37]. Maximum-connected-region detection is useful to delete the background noise. After palm region image is obtained, there are four approaches to detect the valley points: 1. Competitive valley detection algorithm [35], which traverses each contour pixel by testing and comparing its neighbor pixels' grayscale values. After palm segmentation, a binary palm image is obtained. The pixel on the palm contour is tested, taking the current contour pixel as the center point, and then 4, 8, and 16 testing points are placed around it, respectively. If in all the three tests, the pixels' values meet the predefined conditions, a line will be drawn from the center point toward the non-hand region. If this line does not cross any hand region, this center pixel is considered as a valley location. In the same way, we can find the other candidate valley points.
2. Line-scan-based methods [4,27,33]. After rotation normalization, the pixels are tested through a row or a column according to the specific hand orientation. In the segmented hand image, the hand region pixels are set as white, and the background pixels are set as zero. So once the pixel value changes from white to black or from black to white, the keypoints of the finger contour are detected. Then, the finger valleys can be obtained by edge tracking (as is shown in Figure 2(c)).
3. Local-extremum-based methods [30,33,[38][39][40][41]. As is shown in Figure 2(a), by selecting a point as the start point, we can calculate the distances between the start point and all the palm contour points to generate a distance curve. Then, on the distance curve, the local maximum points correspond to fingertips; the local minimum points correspond to finger valley points. The finger valleys could be segmented from the palm contour around the detected valley points. Then the tangent line of the two finger valleys can be detected as the reference line.
4. Convex hull-based methods [28,42,43]. The minimum polygon is detected to encapsulate the palm contour. Generally, the fingertips are vertexes of the convex hull. Then, the finger valleys and the valley points could be obtained as the methods mentioned above.
Generally, after the four finger valley points are obtained, we should get to know whether this hand is left or right so that the two desired valley points can be determined. As to how to identify the left and right hand, literature [35] uses geometric rules of the coordinates; literature [30] utilizes the valley areas, generally, the valley area between thumbs and index finger is bigger than that between the little and ring finger; literature [41] trained a CNN to classify it; the method proposed in [44] does not need to know the left or right information.
Rotation normalization and scale normalization are two key problems lie in palmprint preprocessing. In [33], the authors analyzed the existing methods and provided their optimized solutions in palm width detection and center point generation. Rotation normalization aims to rotate all the palms to a standard direction. To determine the main direction of the palm, many methods have been proposed. In [40], principal component analysis (PCA) is utilized to estimate the rotation angle of the palm. In [17,44], the author utilized the training set to learn a regression model which can map the landmarks' coordinates to the palm's main direction. Hence, after landmark detection, the palm direction can be obtained by the regression model. In [30], the line crossing the middle fingertip and the palm center point is treated as the palm's center line; the palm's orientation is estimated by the line's slope.
The center point of the palm could be determined by different methods, such as the centroid of the palm region [40], the center point of the palm's maximum inscribed circle [30], the point which reaches the maximum distance value after distance transform [45,46], or the shift from the middle point of the palm width line detected based on heart line [33].
With the information of the hand rotation angle, the palm image could be normalized to the standard direction. Then, what we need to do is scale normalization, which means to determine the side length of the ROI. The work reported in [10,17,21,30,33,47] utilized the palm width to determine the size of the ROI, while the work reported in [4,[27][28][29] utilized the length of the tangent line to determine the size of the ROI. In [30], the author found that big ROI performs better. Perhaps big regions can decrease the influences of the misalignment. Here, we provide two examples for better understanding the whole process of ROI extraction.
In [27], the center block (13 Â 13 pixels) of the image is utilized to train the skin-color model, and then the palm region is segmented based on skin-color thresholding. The candidate landmarks are obtained using the method proposed in [35]. The author proposed a two-stage strategy to achieve high robustness, i.e., palms with very big rotations or imperfect hand segmentations. In the first stage, the coarse palm direction is detected. Each candidate valley point will generate its own direction angle, and the angles will be partitioned into four coarse directions, namely, up, down, left, and right. The coarse main direction is the one which has the most supporters, and the inconsistent angles will be deleted. Then, the palm direction is calculated from the remaining angles. In the second stage, the palm image is rotated so that the four fingers point to the standard direction, and then the line-scan-based method is used to track the finger valleys. Similarly, after the valley points are detected, the ROI is derived in accordance with the reference line generated by the two valley points.
In [30], the palm is segmented from the IR light palm image by the Otsu and maximum connected domain algorithms. Then the center point of the palm is determined by the maximum inscribed circle. Right of the center point, a start point could be set. Then, the two-phase keypoint detection method is utilized for detecting the finger tips and valleys. First, the distance curve is generated by the start point and the palm contour points, and the fingertip of the middle finger is then obtained. Based on the fingertip and the center point, a new reference point could be generated to replace the start point used in the first phase. Then, with the palm orientation information, a new distance curve is generated. The precise fingertips and valley points are finally detected by the extremum points of this new distance curve. The tangent line of the valleys around the two detected valley points are obtained (as is shown in Figure 2(a)); we scan the palm region using lines which are parallel with the tangent line. Each line provides a palm width value, and the final palm width is determined by their median value. Last, the ROI is derived according to the reference line and the palm width.

New-generation methods
The methods mentioned above are all based on traditional digital image processing techniques. Most of them just utilized the edge information of the palm. However, it is not sufficient and it leads the algorithms being sensitive to palm postures and background objects. In recent years, many new methods have been proposed, such as the active shape model (ASM)-based methods [48,49], the active appearance model (AAM)-based methods [17,29,50], the regression tree-based methods [47], and the deep learning-based methods [24,41]. The new-generation methods utilized both the edge and texture information to learn much more robust models to regress the landmarks. The main stages of palmprint ROI localization is detecting the palm region from the whole image, regressing the landmarks, determining the palm orientation and width, establishing local coordinate system, and computing the ROI locations.
In [17,44], 25 hand landmarks are selected to form a shape, including 10 end points and 15 landmarks of the finger valleys and palm boundary. This shape convers the finger roots and the interdigital regions of the palm. By AAM algorithm, both the hand shape and the palm texture information are utilized, the shape and corresponding landmark points can automatically reshape itself to fit the real hand contour. To evaluate the localization performance, the authors proposed a modified point-to-curve distance and a margin width metric. Since the initial position of the shape model is critical to the regression performance, the fitting process is divided into two stages. At first five rotations and five scale factors are used to generate 25 initial shapes. After regression, only the shape models, which obtain the 15 optimal reconstruction errors, are passed to the second stage for fine-grained regression.
In [41], the authors proposed a CNN framework based on LeNet [51] to detect the finger valley points. The proposed network involves convolutional layers and fully connected layers; the output is a six-dimensional vector corresponding to the three valley points between fingers excluding the thumb. In their work, two neural networks are designed: one is for identifying whether the hand is left or right, and another is for landmark localization. According to their experiments, the first network can perfectly identify the hand being a left or right hand, and the landmark localization performance is better than the classical method which is based on Otsu segmentation and Zhang's ROI localization algorithm [4].
In [24], based on VGG-16 [52], the authors designed an end-to-end neural network to localize the hand landmarks, generate the aligned ROI, and do feature extraction and recognition tasks at the same time. The hand region is extracted from the original Internet image, and then it is resized to 227 Â 227 pixels. The normalized palm image will be put into the designed CNN for aligned ROI localization, feature extraction, and classification. The proposed network consists of two subnetworks, ROI Localization and alignment network (ROI-LAnet) and feature extraction and recognition network (FERnet). More than three landmarks are determined by the author in order to be able to parametrize non-rigid transformations.
In [47], at first the palm position is detected by techniques of sliding window, histogram of oriented gradient (HOG) [53], and support vector machine (SVM). In the training set, 14 landmark points are determined and labeled manually. After landmark point regression, the reference line is established by the two valley points. The position of the center point and the side length of the ROI both are determined by the palm width.
However, there still exist some challenging problems in ROI localization waiting for better solutions: Compared with the new-generation methods which need to label the landmarks manually and train the regression model, the classical methods based on hardware and capture mode optimization and digital image processing algorithms are easier to use. They also can achieve high localization precisions due to its strict imaging conditions. Hence, in this chapter we still utilize the classical methods to extract the ROIs for different palmprint databases.

The ROI localization method
As discussed above, palmprint localization involves four main stages: (1) palm region segmentation, (2) palm contour and finger valley landmark detection, (3) ROI coordinate computation, and (4) abnormal detection. The method used in this chapter is modified from [4,30]. For palm segmentation, Otsu-based methods can achieve good results in IR image, but for visible light image, the segmentation results will be interfered by the shadow regions on the palm surface. To achieve high success rates of ROI localization, the skin-color based classifier is utilized to separate the palm region. The main work of landmark localization is detecting the finger valleys of the index-middle fingers and the ring-little fingers (as is shown in Figure 2(a) and (d)). For contact-based palmprint image, the palm position is restricted by the pegs, so the finger valleys can be easily localized by line-scan-based methods (as is shown in Figure 2(c)). The pixels are tested from up to down, once the value changes from white to black, point p 1 is detected. Keep testing, once the pixel value changes from black to white, point p 2 is detected. The other keypoints (p 3 À p 6 ) also are searched in this way. If p 1 À p 6 cannot be detected in the current column, the scan line l s will move from left to right by a predefined step to test the new column. This process will be iteratively conducted before p 1 À p 6 are detected. Then, finger valleys v1 and v2 can be obtained by edge tracking. vp 1 and vp 2 are the detected tangent points. This scan-line-based method is effective for contact-based palm images. However, in touchless environment, most palms are captured with obvious rotations, and the space between fingers also varies a lot. Hence, the linescan-based methods are not always workable. To deal with palm rotations, hand direction should be detected first. Hand orientation can be represented by the direction of the principal axis of the palm region pixels after principal component analysis or by the direction of the line which passes through the palm center point c and the fingertip of the middle finger (as is shown in Figure 2(a)). The PCA-based method can be combined with the line-scan-based method to achieve simple detection for rotated palms. But we prefer the second strategy, because if someone's thumb is as long as the little finger, it may decrease the performance of the linescan-based method. Based on the methods proposed in [30], after palm segmentation and contour detection, the reference point rp is determined. Connecting rp with each point on the palm contour in anti-clockwise direction, for each pair of points, we could calculate their distance, and then a distance list is obtained after all the points are traversed. The distance curve is shown in Figure 2(b). The extremum points of this curve correspond to fingertips and valley points. Then the finger valleys can be obtained on the palm contour. After v 1 and v 2 are obtained, the tangent line can be detected by the method proposed in [4]. Its length is denoted as l t . Based on the tangent line, the palm coordinate system could be established. d stands for the distance between the ROI and the tangent line; d can be determined by length l t . The ROI side length s also can be determined according to l t . Let d ¼ α Á l t , and s ¼ β Á l t . Different values of α, β may result in different recognition performance.

Abnormal detection and iterative localization
The keypoint detection method described above is based on local vision. The algorithms know whether the predefined keypoints, p 1 À p 6 and vp 1 À vp 2 , are obtained. But they cannot tell whether the detected points are the correct ones. Background noise and abnormal palm poses will cause error localizations and thus generate abnormal ROIs. Those ROI images should be removed to avoid security risks. For example, during the process of sample registration, if a ROI falsely located in the background region, the black ROI image will be extracted and registered. This may cause big risk in real-world applications, since everyone can pass the system by a black image. Many deep learning-based image denoising methods have been proposed [54,55], but, to some extent, they are too time-consuming to the target of palmprint image preprocessing. Hence, a high-speed abnormal ROI detection method is required. Here, the angle and scale of the ROI, the ratio of the background region (if there exist background regions in the located ROI), and the ratio of the width of the two finger valleys are selected as features for abnormal detection. They are denoted as θ, l t , r bg , and r h , respectively. The area of the ROI stands for the scale information, so we use the tangent line length l t instead. Then, for each time ROI localization, the feature vector θ, l t , r bg , r h È É can be obtained. To train a SVM-based abnormal detector, first, conduct the simple localization algorithm described above on the training set to generate different kinds of ROIs (as is shown in Figure 3). Then, separate the ROIs into normal and abnormal subsets. Last, a binary classifier can be trained by them. According to our experiments, all the false ROIs in Figure 3 can be successfully detected. With the abnormal detector, for linescan-based method, once the current localized ROI is refused by the detector, it can move to the next position to iteratively detect the ROI. If the terminal condition is triggered, it means this image sample is unprocessable.

Experiments on different databases
In this section, the performance of the designed method is tested on different palmprint databases. More details and further updates can be found at [56]. The error rates of ROI localization are enumerated in Table 2. For IITD and COEP, the numbers in hard samples stand for PalmID_SampleID; for PolyU, the numbers stand for PalmID. After each experiment, the error cases are analyzed in detail.  difficult to segment the palm region. The strong light source and the small enclosure lead to light reflections and ray occlusions, which generate many bright regions in the background and dark regions on the palm surface. Hence, the brightness information is not sufficient for segmenting the palm region. The color information should be utilized. As is shown in Figure 5, we randomly cropped some palm skin and background image patches to build a training set for segmentation. A SVMbased binary classifier is learned from the training dataset; the segmented palm   region can be seen in Figure 6. For palm skin patches, both the bright and dark regions are selected to learn a precise classification plane. In the color space, the palm can be easily segmented from the unicolor background. After palm region segmentation, vp 1 and vp 2 are detected by the local-extremum-based method. Results: Figure 6 shows the IITD samples that are hard to process. If we cannot detect five fingertips and four finger valley points after palm segmentation, the system will directly return by giving an error code (as is shown in Figure 6(a) and (c)). If false keypoints are detected due to difficult palm poses, the finally extracted ROI images are abnormal images which should be discarded in real-world applications (as is shown in Figure 6(b) and (d)).

ROI extraction for COEP Palmprint Database
The line-scan-based keypoint detection method is used for COEP. Figure 7 shows the ROI localization results on COEP database. Since the pegs used in their imaging setup may interfere the keypoint detection algorithm, we should delete  them first. The pegs' colors are green, blue, and yellow. After removing the bright yellow pixels in the image, we extract the red channel from the original RGB image to conduct ROI localization algorithm. In this way, the green and blue pegs can be automatically removed (as is shown in Figure 7). Results: as is shown in Figure 8, after ROI localization, four images failed to be correctly localized. All of the four error cases are caused by closed fingers.

ROI extraction for PolyU Palmprint Database
For PolyU database, which contains 7752 images, the line-scan-based method is utilized to localize the ROI. At last, 42 samples failed to be localized. As is shown in Figure 9, most of them are caused by small finger valleys (palm pose) and unideal palm region segmentations (only grayscale information can be utilized). In Table 2, only the user ID is listed for the PolyU database.

Conclusions
The motivation of this chapter is providing a uniform ROI localization method to extract standard ROI images. This is very meaningful for comparing the new proposed feature extraction and identification algorithms. This also can lower the threshold of the palmprint research for beginners, because preprocessing is very complex and time-consuming. The method used in this chapter is not for real-world applications; it is only a ROI extraction tool for the publicly available databases. According to this goal, the simple method, based on classical digital image processing and machine learning techniques, is selected in this chapter.