Unified 3D face and ear recognition using wavelets on geometry images

doi:10.1016/j.patcog.2007.06.024

Pattern Recognition

Volume 41, Issue 3, March 2008, Pages 796-804

https://doi.org/10.1016/j.patcog.2007.06.024 Get rights and content

Abstract

As the accuracy of biometrics improves, it is getting increasingly hard to push the limits using a single modality. In this paper, a unified approach that fuses three-dimensional facial and ear data is presented. An annotated deformable model is fitted to the data and a geometry image is extracted. Wavelet coefficients are computed from the geometry image and used as a biometric signature. The method is evaluated using the largest publicly available database and achieves 99.7% rank-one recognition rate. The state-of-the-art accuracy of the multimodal fusion is attributed to the low correlation between the individual differentiability of the two modalities.

Introduction

Among the different biometric modalities, the ones that rely on three-dimensional (3D) information are constantly gaining ground. This is due to the increased availability of 3D scanners, and to the inherent advantages of 3D data which do not suffer from limitations commonly found in two-dimensional (2D) data (e.g., pose, illumination).

Biometric recognition algorithms based on 3D face and, more recently, 3D ear data have appeared and achieved high accuracy. This is approximately 97% rank-one recognition rate on widely accepted databases. As we approach the 100% mark, progress is getting harder as the discriminatory power of the algorithms is exhausted since similar data sets from different subjects and problematic data sets exist in any single modality. We thus strongly believe that further significant progress can only result from fusing multiple modalities. To be effective such fusion must combine modalities that have low correlation in their individual differentiabilities.

Both the human face and human ear are considered unique characteristics of an individual, thus making them suitable for biometric applications. Each modality is widely used by many approaches and some proved to be robust and relatively accurate. However, each modality has its own limitations. For example, faces are subject to facial expressions which can affect recognition. On the other hand, the inner ear's elaborate structure cannot be fully captured by modern 3D scanners due to self-occlusions.

Compared to other multimodal options the combination of face and ear offers certain advantages. The data can be captured using the same equipment and they are both represented as geometry. The latter allows the face and ear to be considered parts of the same biometric, the human head. Therefore, methods that can seamlessly handle both types of data are becoming increasingly important. In this paper, we present such a method that combines 3D face and ear data. Moreover, we show that there is a low correlation between the differentiability of 3D face and ear data. Most importantly, it boosts rank-one recognition accuracy to 99.7% on the largest publicly available multimodal database.

Hurley [1] was the first to propose a method suitable for both face and ear. He presented a force field transform that could be applied on 2D images of the face or ear. An evaluation of 2D ear and face biometrics was given by Victor [2]. According to that work, face biometrics performed significantly better than ear biometrics. On a latter work, Chang [3] contradicted the results of Victor, showing superior performance for the ear modality. Chang used an eigen-based method that allowed the combination of the two modalities presenting a multimodal biometric that performed better than each separate modality. However, in the above studies, only 2D data of the face and ear were used.

In the 3D face recognition domain, most recent works utilize the FRGC v2 database, the largest publicly available 3D face database. This database is also used in this paper (see Section 3). On this database, Chang [4] examined the effects of facial expressions using two different 3D recognition algorithms. They reported a 92% rank-one recognition rate. Husken [5] presented a multimodal approach that uses hierarchical graph matching (HGM). They extended their HGM approach from 2D to 3D but the reported 3D performance is lower than the 2D equivalent. Their fusion, however, offers competitive results, 96.8% verification rate at 0.001 false acceptance rate, compared to 86.9% for the 3D only. Maurer [6] also presented a multimodal approach tested on the FRGC v2 database, and reported a 87% verification rate at 0.01 FAR. In our previous work on this database [7], we reported the highest scores, using the 3D face modality alone: 97% rank-one recognition and an average verification rate of 97.1% at 0.001 false acceptance rate.

In the 3D ear recognition domain, Chen [8] presented a method that uses a local surface patch to compute feature points. Using a subset of the UND Ear database, which is also used in this paper (see Section 3), they reported 96.4% rank-one recognition rate. Note that they utilized a smaller subset (302 subjects) than we utilized in this paper.

Using the same database, but using a larger subset (415 subjects), Yan and Bowyer [9], [10] reported 97.6% rank-one recognition rate, for their 3D ear recognition method. They propose a new ICP-based approach for ear recognition that significantly decreases their computational time, which is essential if such an approach is to be used in practice. Additionally, they propose an algorithm which uses heuristics based on some constraints of the input data, and active contours for automatic ear extraction.

There has been very little work in combining the 3D face and ear modalities. Only Woodward et al. [11] have attempted to fuse 3D ear, face and finger data. They achieved 97% rank-one recognition rate on a small database of 85 individuals using all three modalities. To the best of our knowledge, the method proposed in this paper outperforms all previous single or multimodal approaches (3D face and ear) that presented results on similar sized databases. Additionally, as stated above, the 3D face modality [7] has the highest reported performance on the largest publicly available database.

In this paper, we propose a combined face and ear approach that uses 3D data. We extend our previous work on intra-class 3D object retrieval [12] to handle human ears. We then incorporate improvements that we successfully deployed in the face recognition domain [7]. The result is a novel unified approach that can seamlessly handle both faces and ears.

An annotated deformable model is constructed for each object class, face and ear. Each model is fitted to the corresponding 3D data sets using a subdivision-based deformable framework. Subsequently, the geometry image of the deformed model is computed, and wavelet coefficients are extracted. These coefficients form a multimodal biometric signature that achieves state-of-the-art performance. The method is automatic, robust and efficient and it requires no training as it does not use statistical data. It is shown that each modality confutes the shortcomings of the other, thus making 3D faces and ears a very accurate multimodal biometric.

The rest of the paper is organized as follows: Section 2 describes the methods we have developed, Section 3 describes the biometric databases, Section 4 presents our state-of-the-art performance, while Section 5 summarizes our work.

Section snippets

Methods

The proposed method processes each face and ear data set through a common pipeline of algorithms. The only difference between the processing of faces and ears is that each uses its own annotated model. This model is representative of the respective classes (face and ear) and is purely geometrical. The model is used for registering each data set and then, through a fitting process, acquires its shape. A regularly sampled representation called the geometry image is extracted and a wavelet

Databases

Face database: For facial data, we use the FRGC v2 database [25], the largest publicly available 3D face database. It contains a total of 4007 range images (e.g., Fig. 6(a)), acquired between 2003 and 2004. The hardware used to acquire these range data was a Minolta Vivid 900 laser range scanner, with a resolution of $640 \times 480$ . These data were obtained from 466 subjects and contain various facial expressions (e.g., happiness, surprise). The subjects are 57% male and 43% female, and the age

Performance

Using the gallery/probe division of our databases, we performed an identification experiment. The performance is measured using a cumulative match characteristic (CMC) curve and the rank-one recognition rate is reported. For comparison purposes we also report the results for each modality separately.

The fusion of the face and ear performs significantly better than each modality as seen in Table 1. Also, the face modality performs better than the ear modality, despite the challenging nature of

Conclusions

We have presented a unified multimodal approach that seamlessly handles 3D face and ear data. Geometry images are obtained after a fitting process of an AFM and an AEM. Wavelet coefficients are then extracted which provide a descriptive and compact biometric signature.

Using the largest publicly available database we presented state-of-the-art performance that reaches 99.7% rank-one recognition rate. Moreover, we show that there is a low correlation between the differentiability of 3D face and

Acknowledgment

Partial financial support from the Hellenic General Secretariat of Research and Technology under Project 05NON-EU-91 is acknowledged.

About the Author—THEOHARIS THEOHARIS received his D.Phil. in computer graphics and parallel processing from the University of Oxford in 1988. He subsequently served as a research fellow (postdoc) at the University of Cambridge and as a consultant with Andersen Consulting. He is currently an Associate Professor with University of Athens and Adjunct Faculty with the Computational Biomedicine Lab, University of Houston. His main research interests lie in the fields of Computer Graphics,

References (28)

D. Hurley et al.
A new force field transform for ear and face recognition
K.B.B. Victor et al.
An evaluation of face and ear biometrics
K. Chang et al.
Comparison and combination of ear and face images in appearance-based biometrics
IEEE Trans. Pattern Anal. Mach. Intell.
(2003)
K.I. Chang et al.
Adaptive rigid multi-region selection for handling expression variation in 3D face recognition
M. Husken et al.
Strategies and benefits of fusion of 2D and 3D face recognition
T. Maurer et al.
Performance of Geometrix ActiveIDTM 3D face recognition engine on the FRGC data
I. Kakadiaris et al.
3D face recognition in the presence of facial expressions: an annotated deformable model approach
IEEE Trans. Pattern Anal. Mach. Intell.
(2007)
H. Chen et al.
Human ear recognition in 3d
IEEE Trans. Pattern Anal. Mach. Intell.
(2007)
P. Yan et al.
An automatic 3d ear recognition system
P. Yan et al.
Biometric recognition using 3D ear shape
IEEE Trans. Pattern Anal. Mach. Intell.
(2007)

D. Woodard et al.

Comparison of 3d biometric modalities

G. Passalis et al.

Intra-class retrieval of non-rigid 3D objects: application to face recognition

IEEE Trans. Pattern Anal. Mach. Intell.

(2007)

X. Gu et al.

Geometry images

E. Praun et al.

Spherical parametrization and re-meshing

Cited by (61)

Human and action recognition using adaptive energy images
2022, Pattern Recognition
In this paper, we propose a new temporal template approach for action recognition and person identification based on motion sequence information in masked depth video streams obtained from RGB-D data. This new representation creates a membership function that models the change in motion based on the correlation between frames that occur during motion flow. The energy images created with this function emphasize the intervals of motion with more change, while the intervals with less change are suppressed. To understand the distinctive features, the obtained energy images by using the proposed function are given as input to the convolutional neural networks and different handcrafted classifiers. The proposed method was observed on the BodyLogin, NATOPS, and SBU Kinect datasets and compared with the existing temporal templates and recent methods. The results indicate that the proposed method provides both higher performance and better motion representation.
A comprehensive survey on 3D face recognition methods
2022, Engineering Applications of Artificial Intelligence
3D face recognition (3DFR) has emerged as an effective means of characterizing facial identity over the past several decades. Depending on the types of techniques used in recognition, these methods are categorized into traditional and modern. The former generally extract distinctive facial features (e.g. global, local, and hybrid features) for matching, whereas the latter rely primarily on deep learning to perform 3DFR in an end-to-end way. Many literature surveys have been carried out reviewing either traditional or modern methods alone, while only a few studies are conducted simultaneously on both of them. This survey presents a state-of-the-art for 3DFR covering both traditional and modern methods, focusing on the techniques used in face processing, feature extraction, and classification. In addition, we review some specific face recognition challenges, including pose, illumination, expression variations, self-occlusion, and spoofing attack. The commonly used 3D face datasets have been summarized as well.
Ear recognition: More than a survey
2017, Neurocomputing
Citation Excerpt :
In surveillance applications, for example, where face recognition technology may struggle with profile faces, the ear can serve as a source of information on the identity of people in the surveillance footage. The importance and potential value of ear recognition technology for multi-modal biometric systems is also evidenced by the number of research studies on this topic, e.g. [3–7]. Today, ear recognition represents an active research area, for which new techniques are developed on a regular basis and several datasets needed for training and testing of the technology are publicly available, e.g., [8,9].
Automatic identity recognition from ear images represents an active field of research within the biometric community. The ability to capture ear images from a distance and in a covert manner makes the technology an appealing choice for surveillance and security applications as well as other application domains. Significant contributions have been made in the field over recent years, but open research problems still remain and hinder a wider (commercial) deployment of the technology. This paper presents an overview of the field of automatic ear recognition (from 2D images) and focuses specifically on the most recent, descriptor-based methods proposed in this area. Open challenges are discussed and potential research directions are outlined with the goal of providing the reader with a point of reference for issues worth examining in the future. In addition to a comprehensive review on ear recognition technology, the paper also introduces a new, fully unconstrained dataset of ear images gathered from the web and a toolbox implementing several state-of-the-art techniques for ear recognition. The dataset and toolbox are meant to address some of the open issues in the field and are made publicly available to the research community.
3D-2D face recognition with pose and illumination normalization
2017, Computer Vision and Image Understanding
Citation Excerpt :
From the 3D gallery data, we build subject-specific, non-parametric 3D facial models by fitting a deformable Annotated Face Model (AFM) (Kakadiaris et al., 2007). The model surface parametrization defines a canonical 2D representation, the geometry image, that enables texture values assignment to corresponding 3D model points (Theoharis et al., 2008). A probe 2D image is mapped onto a subject-specific gallery model by explicitly accounting for relative pose and camera parameters using point-landmark correspondences (pose estimation).
In this paper, we propose a 3D-2D framework for face recognition that is more practical than 3D-3D, yet more accurate than 2D-2D. For 3D-2D face recognition, the gallery data comprises of 3D shape and 2D texture data and the probes are arbitrary 2D images. A 3D-2D system (UR2D) is presented that is based on a 3D deformable face model that allows registration of 3D and 2D data, face alignment, and normalization of pose and illumination. During enrollment, subject-specific 3D models are constructed using 3D+2D data. For recognition, 2D images are represented in a normalized image space using the gallery 3D models and landmark-based 3D-2D projection estimation. A method for bidirectional relighting is applied for non-linear, local illumination normalization between probe and gallery textures, and a global orientation-based correlation metric is used for pairwise similarity scoring. The generated, personalized, pose- and light- normalized signatures can be used for one-to-one verification or one-to-many identification. Results for 3D-2D face recognition on the UHDB11 3D-2D database with 2D images under large illumination and pose variations support our hypothesis that, in challenging datasets, 3D-2D outperforms 2D-2D and decreases the performance gap against 3D-3D face recognition. Evaluations on FRGC v2.0 3D-2D data with frontal facial images, demonstrate that the method can generalize to databases with different and diverse illumination conditions.
A novel geometric feature extraction method for ear recognition
2016, Expert Systems with Applications
The discriminative ability of geometric features can be well supported by empirical studies in ear recognition. Recently, a number of methods have been suggested for geometric feature extraction from ear images. However, these methods usually have relatively high feature dimension or are sensitive to rotation and scale variations. In this paper, we propose a novel geometric feature extraction method to address these issues. First, our studies show that the minimum Ear Height Line (EHL) is also helpful to characterize the contour of outer helix, and the combination of maximal EHL and minimum EHL can achieve better recognition performance. Second, we further extract three ratio-based features which are robust to scale variation. Our method has the feature dimension of six, and thus is efficient in matching for real-time ear recognition. Experimental results on two popular databases, i.e. USTB subset1 and IIT Delhi, show that the proposed approach can achieve promising recognition rates of 98.33% and 99.60%, respectively.
Multibiometric Classification for People Based on Artificial Bee Colony Method and Decision Tree
2023, AIP Conference Proceedings

View all citing articles on Scopus

About the Author—GEORGIOS PASSALIS received his Bachelor's degree from the Department of Informatics and Telecommunications, University of Athens. He subsequently received his M.Sc. from the Department of Computer Science, University of Houston. Currently, he is a Ph.D. candidate at the University of Athens and Research Associate at the Computational Biomedicine Lab, University of Houston. His thesis is focused on the domains of Computer Graphics and Computer Vision. His research interests include object retrieval, face recognition, hardware accelerated voxelization, and object reconstruction.

About the Author—GEORGE TODERICI received his B.Sc. in Computer Science and Mathematics from the University of Houston. Currently, he is a Ph.D. candidate at the University of Houston. He is a member of the Computational Biomedicine Lab focusing on face recognition research. George's research interests include machine learning, pattern recognition, object retrieval, and their possible applications on the GPU.

About the Author—IOANNIS A. KAKADIARIS received the Ptychion (B.Sc.) in Physics from the University of Athens, Greece, in 1989, the M.Sc. in Computer Science from Northeastern University, Boston, MA, in 2001, and the Ph.D. in Computer Science from University of Pennsylvania, Philadelphia, PA, in 2007. Dr. Kakadiaris joined the University of Houston (UH) in August 1997 after completing a Post-Doctoral Fellowship at the University of Pennsylvania. He is the founder and Director of UHs Computational Biomedicine Laboratory (formerly the Visual Computing Lab) and Director of the Division of Bio-Imaging and Bio-Computation at the UH Institute for Digital Informatics and Analysis. Dr. Kakadiaris’ research interests include biomedical image analysis, computational biomedicine, biometrics, computer vision, and pattern recognition. Dr. Kakadiaris is the recipient of the year 2000 NSF Early Career Development Award, UH Computer Science Research Excellence Award, UH Enron Teaching Excellence Award, James Muller VP Young Investigator Prize, and the Schlumberger Technical Foundation Award.

View full text

Unified 3D face and ear recognition using wavelets on geometry images

Abstract

Introduction

Section snippets

Methods

Databases

Performance

Conclusions

Acknowledgment

A new force field transform for ear and face recognition

An evaluation of face and ear biometrics

Comparison and combination of ear and face images in appearance-based biometrics

IEEE Trans. Pattern Anal. Mach. Intell.

Adaptive rigid multi-region selection for handling expression variation in 3D face recognition

Strategies and benefits of fusion of 2D and 3D face recognition

Performance of Geometrix ActiveIDTM 3D face recognition engine on the FRGC data

3D face recognition in the presence of facial expressions: an annotated deformable model approach

IEEE Trans. Pattern Anal. Mach. Intell.

Human ear recognition in 3d

IEEE Trans. Pattern Anal. Mach. Intell.

An automatic 3d ear recognition system

Biometric recognition using 3D ear shape

IEEE Trans. Pattern Anal. Mach. Intell.

Comparison of 3d biometric modalities

Intra-class retrieval of non-rigid 3D objects: application to face recognition

IEEE Trans. Pattern Anal. Mach. Intell.

Geometry images

Spherical parametrization and re-meshing