A robust face and ear based multimodal biometric system using sparse representation

doi:10.1016/j.patcog.2013.01.022

Pattern Recognition

Volume 46, Issue 8, August 2013, Pages 2156-2168

https://doi.org/10.1016/j.patcog.2013.01.022 Get rights and content

Abstract

If fusion rules cannot adapt to the changes of environment and individual users, multimodal systems may perform worse than unimodal systems when one or more modalities encounter data degeneration. This paper develops a robust face and ear based multimodal biometric system using Sparse Representation (SR), which integrates the face and ear at feature level, and can effectively adjust the fusion rule based on reliability difference between the modalities. We first propose a novel index called Sparse Coding Error Ratio (SCER) to measure the reliability difference between face and ear query samples. Then, SCER is utilized to develop an adaptive feature weighting scheme for dynamically reducing the negative effect of the less reliable modality. In multimodal classification phase, SR-based classification techniques are employed, i.e., Sparse Representation based Classification (SRC) and Robust Sparse Coding (RSC). Finally, we derive a category of SR-based multimodal recognition methods, including Multimodal SRC with feature Weighting (MSRCW) and Multimodal RSC with feature Weighting (MRSCW). Experimental results demonstrate that: (a) MSRCW and MRSCW perform significantly better than the unimodal recognition using either face or ear alone, as well as the known multimodal methods; (b) The effectiveness of adaptive feature weighting is verified. MSRCW and MRSCW are very robust to the image degeneration occurring to one of the modalities. Even when face (ear) query sample suffers from 100% random pixel corruption, they can still get the performance close to the ear (face) unimodal recognition; (c) By integrating the advantages of adaptive feature weighting and sparsity-constrained regression, MRSCW seems excellent in tackling the face and ear based multimodal recognition problem.

Highlights

► Introduce a new index SCER to measure the reliability difference between face and ear. ► Develop an adaptive feature weighting scheme for integrating face and ear features. ► Two multimodal methods based on sparse representation have promising robustness.

Introduction

Biometric systems relying on a single trait have to contend with a variety of practical problems like noise, non-university, upper bound on identification accuracy, spoof attacks, etc. [1]. In order to address some of the limitations and improve recognition performance, multiple sources of information are proposed to be combined together to form multimodal biometric systems, which are generally believed to be more reliable and have attracted much attention recently. In the past decade, various multimodal combinations have been reported, including face and fingerprint [2], [3], face and iris [4], fingerprint and iris [3], face and ear [5], [6], [7], [8], [9], [10], etc. Based on the type of information available in a certain module, different levels of fusion can be defined, i.e., sensor level, feature level, and match score, rank, decision levels. Sanderson and Paliwal [11] categorized the fusions performed at the former two levels into pre-classification category, and call the rest as post-classification fusion. Post-classification fusions are fairly popular due to the ease of accessing and processing the match scores, ranks and individual decisions. In contrast, combinations at early stage are relatively difficult because the raw biometric data may contain noisy or redundant data, while features extracted from different biometric traits may be incompatible. Moreover, even the multimodal system using feature level fusion does not work when one or more modalities of query samples are not available [12]. Nevertheless, because of the capability of utilizing more information for classification, pre-classification fusions have drawn more attention in recent years. Especially, feature level fusion can exploit the most discriminative information and eliminate the redundant/adverse information from the raw biometric data, and hence it is expected to provide better performance [4]. In this paper, we will focus on feature level fusion and intend to address some limitations existing in face and ear based multimodal biometric.

Most recently, Sparse Representation based Classification (SRC) techniques have been successfully applied in Face Recognition (FR), and have becoming state-of-the-art techniques in pattern recognition [13], [14], [15], [16], [17]. SRC firstly encodes the query sample with a training sample dictionary and then classifies it to the class which yields the least square coding error. SRC could be seen as a more general model than the previous nearest classifiers, like Nearest Neighbor (NN), Nearest Feature Line (NFL) [18] and Nearest Subspace (NS) [19], [20], [21], and it uses the samples from all classes to collaboratively represent the query sample to overcome the small-sample-size problem in FR [14]. In this paper, we propose to apply SRC techniques, including the original SRC [13] and Robust Sparse Coding model (RSC) [17], to face and ear based multimodal biometric. Two SR-based multimodal methods are developed, namely Multimodal SRC (MSRC) and Multimodal RSC (MRSC). In these methods, appearance-based features of face and ear are separately extracted by using Principal Component Analysis (PCA) [22], and are then directly concatenated in series.

Generally, multimodal biometric systems incorporating more evidences from various modalities can provide better performance than unimodal biometric systems [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [23], [24], [25]. However, if one or more modalities encounter data degeneration, these systems may perform worse than unimodal systems. It mainly results from the fact that most of multimodal systems are based on fixed fusion rules, or their fusion rules cannot effectively adapt to the changes of environment or individual users. In face and ear based multimodal biometric, for achieving good robustness, we propose an adaptive feature weighting scheme based on a novel index called Sparse Coding Error Ratio (SCER). SCER can effectively measure the reliability difference between face and ear query images caused by various factors, such as illumination, pose, expression, corruption, and occlusion. The intuitive motivation is that in many applications face and ear query images are less likely to suffer from degeneration simultaneously, and even if that happens, the degeneration levels are often different. By incorporating the feature weighting scheme into MSRC and MRSC, we derive a second category of SR-based methods, which can dynamically reduce the negative effect of the less reliable modality. They are MSRC with feature Weighting (MSRCW) and MRSC with feature Weighting (MRSCW). Finally, we conduct experiments on two virtual multimodal databases established based on benchmark databases, including the Extended Yale B [26], AR face database [27] and the USTB ear database III [28]. Experimental results demonstrate that, under the conditions without severe degeneration, both categories of SR-based methods not only significantly outperform face and ear unimodal recognition, but also are much better than the multimodal recognition with common classifiers like NN and NFL, including those using serial concatenation and CCA-based feature fusion schemes. Moreover, the SR-based methods with feature weighting, MSRCW and MRSCW, show striking robustness to the image degeneration occurring to one of the modalities. Even when query sample of one modality suffers from 100% random pixel corruption¹, they can still get the performance close to the unimodal recognition with the other modality. It should be noticed that the adaptive feature weighting scheme is not designed to settle the recognition problem when face and ear simultaneously encounter equivalent image degeneration. Therefore, one should not always expect MSRCW or MRSCW to get improvement over MSRC or MRSC, respectively. There is also a result that MRSCW and MRSC perform much better than all the other competing methods in such scenario. This can be attributed to the RSC model, which applies a sparsity-constrained regression process to detect and reduce the effects of corrupted pixels. In our study, by integrating the advantages of adaptive feature weighting and sparsity-constrained regression, MRSCW seems excellent in tackling face and ear based multimodal recognition problem. Overall, our contribution in this paper is threefold, as follows:

(a)
To measure the reliability difference between query samples of the two modalities, a novel index called SCER is introduced, which is critical for adaptive feature fusion;
(b)
An adaptive feature weighting scheme based on SCER is developed for dynamic fusion of face and ear features. The fusion scheme is more flexible, and is proven to be very effective by our extensive experiments;
(c)
SR-based classification techniques are first employed in face and ear based multimodal biometric field. By combining them with adaptive feature weighting, we derive two promising SR-based multimodal methods, namely MSRCW and MRSCW.

The rest of this paper is organized as follows. We outline the related works on face and ear based multimodal biometric in Section 2. Section 3 provides a brief review of SR-based classification techniques, including the original SRC and RSC model. In Section 4, our multimodal biometric system is described in details, including an adaptive feature weighting scheme and two categories of SR-based multimodal recognition methods. Section 5 conducts experiments to evaluate the proposed methods. Finally, concluding remarks are drawn in Section 6.

Section snippets

Related works on the combination of face and ear

Both face and ear have their pros and cons when used for recognition. Face Recognition is non-intrusive, friendly, and the technology is relatively mature, however, face appearance is prone to change with expression, eyeglasses, illumination, pose, etc. Ear Recognition (ER) is also non-intrusive. Compared with the face, due to the relatively small surface and rich 3D structure, it may be more likely for hair to obscure the ear and uneven illumination to change the appearance of ear. However,

Sparse representation based classification

The original goal of sparse representation (or coding) was for representation and compression of signals, potentially using lower sampling rates than the Shannon–Nyquist bound [29]. Nevertheless, Wright et al. [13] reckoned that sparse representation is naturally discriminative and then designed a novel classification scheme, namely SRC, which was employed in FR and achieved impressive performance. Recently, many SR-based methods aiming to extend and improve SRC have been developed and

Our methodologies

With the advantage of utilizing more information for classification, pre-classification fusion methods arouse our intensive interest. Compared to sensor level fusion, feature level fusion can exploit the most discriminative information and eliminate the redundant/adverse information from the raw biometric data [4], thus it can considerably reduce the dimensionality of the fused multimodal information. Since the complexity of sparse coding basically depends on the number of dictionary atoms and

Experiments and discussions

In this section, experiments are performed to evaluate the two categories of SR-based methods. On this purpose, the common classifiers, like NN and NFL, are used for comparing. Their multimodal extensions are called Multimodal NN (MNN) and Multimodal NFL (MNFL), respectively, which also use the equally weighted concatenation of feature fusion. Besides, two multimodal methods using CCA and KCCA presented in [8], [9] are used for comparing as well, which are separately named as MCCA and MKCCA in

Conclusion

In this paper, we have presented a robust face and ear based multimodal biometric system, which can employ four proposed SR-based multimodal recognition methods, including MSRC, MRSC, MSRCW and MRSCW. Compared with MSRC and MRSC, the most important advantage of MSRCW and MRSCW is that they utilize a proposed adaptive feature weighting scheme, based on which they can effectively reduce the negative effect of the less reliable modality. In addition, to measure the reliability difference between

Conflict of interest

We declare that we have no conflict of interest.

Acknowledgment

This work is supported by the NSFC under Grants 61173182 and 61179071, as well as by findings from Sichuan Province under Grants 2011JY0124, 2012HH0004, 2012HH0031, 2012GZ0095.

Zengxi Huang received the B.E. degree from Hainan University, Haikou, China, in 2007, and the M. E. degree from Shenyang Aerospace University, Shenyang, China, in 2010. He is currently pursuing his Ph.D. degree at Sichuan University, Chengdu, China. His research interests include image processing, computer vision and pattern recognition.

References (41)

D. Bouchaffra et al.
Structural hidden Markov models for biometrics: fusion of face and fingerprint
Pattern Recognition
(2008)
T. Theoharis et al.
Unified 3d face and ear recognition using wavelet on geometry images
Pattern Recognition
(2008)
C. Sanderson et al.
Identity verification using speech and face information
Digital Signal Processing
(2004)
T. Zhang et al.
Multimodal biometrics using geometry preserving projections
Pattern Recognition
(2008)
Q. Sun et al.
A new method of feature fusion and its application in image recognition
Pattern Recognition
(2005)
Y. Yao et al.
Face and palmprint feature level fusion for single sample biometrics recognition
Neurocomputing
(2007)
A.K. Jain et al.
Handbook of Biometrics
(2007)
K. Nandakumar et al.
Likelihood ratio-based biometric score fusion
IEEE Transactions of Pattern Analysis and Machine Intelligence
(2008)
Z. Wang, Q. Li, X. Niu, C. Busch, Multimodal biometric recognition based on complex KFDA, in: Proceedings of the5th...
A.F. Abate et al.
Face and ear: a bimodal identification system
Image Analysis and Recognition
(2006)

K. Chang et al.

Comparison and combination of ear and face images in appearance-based biometrics

IEEE Transactions of Pattern Analysis and Machine Intelligence

(2003)

X. Xu et al.

Multimodal recognition using ear and face profile based on CCA

Application Research of Computers (in Chinese)

(2007)

X. Xu, Z. Mu, Feature fusion method based on KCCA for ear and profile face based multimodal recognition, in:...

D.R. Kisku et al.

Multimodal belief fusion for face and ear biometrics

Intelligent Information Management

(2009)

J. Wright et al.

Robust face recognition via sparse representation

IEEE Transactions of Pattern Analysis and Machine Intelligence

(2009)

M. Yang, L. Zhang, J. Yang, D. Zhang, Regularized robust coding for face recognition. arXiv: 1202.4207v2 [cs.CV],...

J. Huang, X. Huang, D. Metaxas, Simultaneous image transformation and sparse representation recovery, in: Proceedings...

Z. Zhou, A. Wagner, H. Mobahi, J. Wright, Y. Ma, Face recognition with contiguous occlusion using markov random fields,...

M. Yang, L. Zhang, J. Yang, D. Zhang, Robust sparse coding for face recognition, in: Proceedings of the IEEE Conference...

S. Li et al.

Face recognition using the nearest feature line method

IEEE Transactions of Neural Networks

(1999)

Cited by (67)

A face recognition taxonomy and review framework towards dimensionality, modality and feature quality
2023, Engineering Applications of Artificial Intelligence
This paper presents a comprehensive review of Automatic Facial Recognition Systems using integrative and systematic mapping approach. The review is grounded on criteria-attribute scheme formulated in proposed Face Recognition framework. The proposed framework provides a unified platform to identify, categorize and understand wholesome Face Recognition taxonomy based on different criteria (Modality, Dimensionality and Feature Quality) and their corresponding attributes (Unimodal-Multimodal, 2D-3D and Physiological-Behavioral). The framework facilitates a user to understand and select attributes across different criteria. The user selection of criteria-attribute is assisted through several selection parameters (Dataset Availability, Application, User Preference, System Complexity and Time Complexity). Depending on the user selection, a criteria-attribute scheme based model is formulated for Face Recognition. This paper also provides critical mathematical insights to understand each attribute extensively. Existing works are analyzed and compared comprehensively and quantitatively based on popular datasets and proposed criteria-attribute framework.
Multimodal biometric recognition using human ear and profile face: An improved approach
2022, Machine Learning for Biometrics: Concepts, Algorithms and Applications
Multimodal biometric systems are recently gaining considerable attention for human identity recognition in uncontrolled scenarios. This chapter presents an improved multimodal biometric recognition by integrating ear and profile face biometrics. At first, both modalities are separately decomposed into a predefined number of scales and orientations using steerable pyramid transform, and then texture features are extracted from each subbands using a histogram-based local descriptor. Three popular local descriptors, local directional patterns, binarized statistical image features, and local phase quantization, are employed and their effectiveness is compared to explore the most discriminative texture descriptor. Finally, the local descriptors of both modalities are fused at the feature level as well as the score level for the recognition of individuals using kNN classifier. Several experiments are conducted on two standard datasets, University of Notre Dame collection E (UND-E) and collection J2 (UND-J2), and the results demonstrate that the proposed multimodal approach using score-level fusion outperforms feature-level fusion. Also, it achieves a higher accuracy compared to unimodal ear and state-of-the-art multimodal biometrics.
Multibiometric fusion strategy and its applications: A review
2019, Information Fusion
The unimodal biometric based system faced several inherent problems like lack of uniqueness, intra-class variation, non-universality, noisy data (presence of dirt on the sensor), restricted degree of freedom, unacceptable error rate, failure-to-enroll and spoofing attack. Multibiometric is one of the best choices to overcome these problems. Multibiometric fusion plays an important role to enhance the overall performance of the system, in which two or more individual biometric are combined together to form a better performance system. The proper use of fusion strategy is very important in the multibiometric system because it can affect the overall performance and accuracy level of the systems. In designing a multibiometric based system we can use various methods and fusion strategies to combine information from multiple sources. This paper is an in-depth study on multibiometric (multimodal, multialgorithm, multi-sample, multi-sensor and multi-instance) fusion strategy and its different applications. In addition, this paper also discusses the different methodology used in a fusion process (Sensor, Feature, Score, Decision, Rank) of multibiometric systems from last three decades and examines the methods used, to explore their successes and failure.
Ear recognition using local binary patterns: A comparative experimental study
2019, Expert Systems with Applications
Citation Excerpt :
Therefore, the analysis of ear images for extracting such unique and distinguishable features to identify individuals and verify their identities is an active research topic and an emerging intelligent biometric application (Emeršič, Štruc, & Peer, 2017; Galdámez, Raveane, & Arrieta, 2017; Pflug & Busch, 2012). The existing ear recognition techniques can be roughly classified depending on the type of feature extraction method into geometric (Choraś, 2005; 2008), holistic (Arbab-Zavar & Nixon, 2011; Fooprateepsiri & Kurutach, 2011; Hanmandlu & Mamta, 2013; Wang & Yuan, 2010; Yuan, Mu, Zhang, & Liu, 2006), and hybrid (Benzaoui, Kheider, & Boukrouche, 2015; Huang, Liu, Li, Yang, & Chen, 2013; Kumar & Chan, 2013; Morales, Diaz, Llinas-Sanchez, & Ferrer, 2015; Pflug, Paul, & Busch, 2014) approaches. Under each category, a variety of ear recognition techniques have been proposed in the literature.
Identity recognition using local features extracted from ear images has recently attracted a great deal of attention in the intelligent biometric systems community. The rich and reliable information of the human ear and its stable structure over a long period of time present ear recognition technology as an appealing choice for identifying individuals and verifying their identities. This paper considers the ear recognition problem using local binary patterns (LBP) features. Where, the LBP-like features characterize the spatial structure of the image texture based on the assumption that this texture has a pattern and its strength (amplitude)-two locally complementary aspects. Their high discriminative power, invariance to monotonic gray-scale changes and computational efficiency properties make the LBP-like features suitable for the ear recognition problem. Thus, the performance of several recent LBP variants introduced in the literature as feature extraction techniques is investigated to determine how can they be best utilized for ear recognition. To this end, we carry out a comprehensive comparative study on the identification and verification scenarios separately. Besides, a new variant of the traditional LBP operator named averaged local binary patterns (ALBP) is proposed and its ability in representing texture of ear images is compared with the other LBP variants. The ear identification and verification experiments are extensively conducted on five publicly available constrained and unconstrained benchmark ear datasets stressing various imaging conditions; namely IIT Delhi (I), IIT Delhi (II), AMI, WPUT and AWE. The obtained results for both identification and verification indicate that the current LBP texture descriptors are successful feature extraction candidates for ear recognition systems in the case of constrained imaging conditions and can achieve recognition rates reaching up to 99%; while, their performance faces difficulties when the level of distortions increases. Moreover, it is noted that the tested LBP variants achieve almost close performance on ear recognition. Thus, further studies on other applications are needed to verify this close performance. We believe that the presented study has significant insights and can benefit researchers in choosing between LBP variants as well as acting as a connection between previous studies and future work in utilizing LBP-like features in ear recognition systems.
Robust multimodal biometric system based on optimal score level fusion model
2019, Expert Systems with Applications
Multimodal biometric systems fuse information from multiple modalities to overcome limitations of individual classifiers. Score level fusion of multiple classifiers can effectively combine information from different modalities. However, most of the multimodal biometric systems are impaired by conflicting classifier scores under dynamic environment, which results in degradation in system's robustness and reliability. To address this, we propose a multimodal biometric system based on an optimal score level fusion model. The key idea of this work is to optimally integrate three complementary biometric traits namely iris, finger vein and fingerprint. For this, individual classifier performance is optimized using evolutionary Backtracking Search Optimization Algorithm (BSA). In addition, conflicting beliefs from individual classifiers are resolved using proportional conflict redistribution rules (PCR-6) to obtain a concurrent solution. The system exhibits optimal behaviour under dynamic environment through boosting or suppression of concurrent classifiers and resolving conflicts among discordant classifiers. The proposed biometric system is evaluated over chimeric multimodal datasets created from benchmark images. On an average, we achieve an accuracy of 98.43% and an EER of 1.57%. The proposed biometric system not only outperforms state-of-the-art techniques but also shows directions towards development of an expert multimodal biometric system.
Discriminative representation combinations for accurate face spoofing detection
2019, Pattern Recognition
Citation Excerpt :
Conclusion and future work are illustrated in Section 6. Despite the multimodal methods [17,18], most face presentation attack detection methods can be divided into five categories: motion based approaches [19–22], texture based approaches [7,13,23], stereo structure based approaches [15,24], deep learning based approaches [9,25,26] and other approaches [6,27–30]. Research surveys can be found in [23,31].
Three discriminative representations for face presentation attack detection are introduced in this paper. Firstly we design a descriptor called spatial pyramid coding micro-texture (SPMT) feature to characterize local appearance information. Secondly we utilize the SSD, which is a deep learning framework for detection, to excavate context cues and conduct end-to-end face presentation attack detection. Finally we design a descriptor called template face matched binocular depth (TFBD) feature to characterize stereo structures of real and fake faces. For accurate presentation attack detection, we also design two kinds of representation combinations. Firstly, we propose a decision-level cascade strategy to combine SPMT with SSD. Secondly, we use a simple score fusion strategy to combine face structure cues (TFBD) with local micro-texture features (SPMT). To demonstrate the effectiveness of our design, we evaluate the representation combination of SPMT and SSD on three public datasets, which outperforms all other state-of-the-art methods. In addition, we evaluate the representation combination of SPMT and TFBD on our dataset and excellent performance is also achieved.

View all citing articles on Scopus

Yiguang Liu received the M.S. degree in Mechanics in 1998 and Ph.D. degree in Computer Application in 2004 from Peking University and Sichuan University, respectively. Currently, he is the director of Vision and Image Processing Laboratory, and the professor of the college of computer, Sichuan University. Prior to joining Sichuan University in 2005, he had ever served as a software engineer or director in several companies such as Industrial Co., LTD of China South Communication. He was promoted as a full professor in 2006, and was chosen into the program for new century excellent talents of MOE of P. R. China in 2008. He had ever worked as a Research Fellow of National University of Singapore (2008), an academic visitor of Imperial College London under the support of Royal Academy of Engineering (2011), and a senior visiting scholar of Michigan State University. He is a reviewer for Mathematical Reviews, IEEE and ACM members. He has authored or co-authored one book and over 80 research papers published in international journals and conference proceedings. His current research interests include computer vision and image processing, pattern recognition, as well as computational intelligence.

Chunguang Li received the M.S. degree in Pattern Recognition and Intelligent Systems and the Ph.D. degree in Circuits and Systems from the University of Electronic Science and Technology of China, Chengdu, China, in 2002 and 2004, respectively. Currently, he is a Professor with the Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China. His current research interests include computational neuroscience, statistical signal processing, and machine intelligence.

Menglong Yang received the M.S. degree and Ph.D. degree in the School of Computer Science and Engineering from Sichuan University,in 2008 and 2012, respectively. From July 2010 to June 2011, he worked in Center for Biometrics and Security Research (CBSR), Institute of Automation, Chinese Academy of Sciences as a visit intern. He is currently a lecturer in the school of Aerospace Science and Engineering, Sichuan University. His research interests include computer vision, pattern recognition and transportation engineering.

Liping Chen received his M.E. degree from Northeast Agricultural University, Harbin, China, in 2004. He is currently pursuing his Ph.D. degree at Sichuan University, Chengdu, China. His research interests include image processing, pattern recognition and artificial intelligence.

View full text

A robust face and ear based multimodal biometric system using sparse representation

Abstract

Highlights

Introduction

Section snippets

Related works on the combination of face and ear

Sparse representation based classification

Our methodologies

Experiments and discussions

Conclusion

Conflict of interest

Acknowledgment

Pattern Recognition

Pattern Recognition

Digital Signal Processing

Pattern Recognition

Pattern Recognition

Neurocomputing

Handbook of Biometrics

Likelihood ratio-based biometric score fusion

IEEE Transactions of Pattern Analysis and Machine Intelligence

Face and ear: a bimodal identification system

Image Analysis and Recognition

Comparison and combination of ear and face images in appearance-based biometrics

IEEE Transactions of Pattern Analysis and Machine Intelligence

Multimodal recognition using ear and face profile based on CCA

Application Research of Computers (in Chinese)

Multimodal belief fusion for face and ear biometrics

Intelligent Information Management

Robust face recognition via sparse representation

IEEE Transactions of Pattern Analysis and Machine Intelligence

Face recognition using the nearest feature line method

IEEE Transactions of Neural Networks