A robust face and ear based multimodal biometric system using sparse representation
Highlights
► Introduce a new index SCER to measure the reliability difference between face and ear. ► Develop an adaptive feature weighting scheme for integrating face and ear features. ► Two multimodal methods based on sparse representation have promising robustness.
Introduction
Biometric systems relying on a single trait have to contend with a variety of practical problems like noise, non-university, upper bound on identification accuracy, spoof attacks, etc. [1]. In order to address some of the limitations and improve recognition performance, multiple sources of information are proposed to be combined together to form multimodal biometric systems, which are generally believed to be more reliable and have attracted much attention recently. In the past decade, various multimodal combinations have been reported, including face and fingerprint [2], [3], face and iris [4], fingerprint and iris [3], face and ear [5], [6], [7], [8], [9], [10], etc. Based on the type of information available in a certain module, different levels of fusion can be defined, i.e., sensor level, feature level, and match score, rank, decision levels. Sanderson and Paliwal [11] categorized the fusions performed at the former two levels into pre-classification category, and call the rest as post-classification fusion. Post-classification fusions are fairly popular due to the ease of accessing and processing the match scores, ranks and individual decisions. In contrast, combinations at early stage are relatively difficult because the raw biometric data may contain noisy or redundant data, while features extracted from different biometric traits may be incompatible. Moreover, even the multimodal system using feature level fusion does not work when one or more modalities of query samples are not available [12]. Nevertheless, because of the capability of utilizing more information for classification, pre-classification fusions have drawn more attention in recent years. Especially, feature level fusion can exploit the most discriminative information and eliminate the redundant/adverse information from the raw biometric data, and hence it is expected to provide better performance [4]. In this paper, we will focus on feature level fusion and intend to address some limitations existing in face and ear based multimodal biometric.
Most recently, Sparse Representation based Classification (SRC) techniques have been successfully applied in Face Recognition (FR), and have becoming state-of-the-art techniques in pattern recognition [13], [14], [15], [16], [17]. SRC firstly encodes the query sample with a training sample dictionary and then classifies it to the class which yields the least square coding error. SRC could be seen as a more general model than the previous nearest classifiers, like Nearest Neighbor (NN), Nearest Feature Line (NFL) [18] and Nearest Subspace (NS) [19], [20], [21], and it uses the samples from all classes to collaboratively represent the query sample to overcome the small-sample-size problem in FR [14]. In this paper, we propose to apply SRC techniques, including the original SRC [13] and Robust Sparse Coding model (RSC) [17], to face and ear based multimodal biometric. Two SR-based multimodal methods are developed, namely Multimodal SRC (MSRC) and Multimodal RSC (MRSC). In these methods, appearance-based features of face and ear are separately extracted by using Principal Component Analysis (PCA) [22], and are then directly concatenated in series.
Generally, multimodal biometric systems incorporating more evidences from various modalities can provide better performance than unimodal biometric systems [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [23], [24], [25]. However, if one or more modalities encounter data degeneration, these systems may perform worse than unimodal systems. It mainly results from the fact that most of multimodal systems are based on fixed fusion rules, or their fusion rules cannot effectively adapt to the changes of environment or individual users. In face and ear based multimodal biometric, for achieving good robustness, we propose an adaptive feature weighting scheme based on a novel index called Sparse Coding Error Ratio (SCER). SCER can effectively measure the reliability difference between face and ear query images caused by various factors, such as illumination, pose, expression, corruption, and occlusion. The intuitive motivation is that in many applications face and ear query images are less likely to suffer from degeneration simultaneously, and even if that happens, the degeneration levels are often different. By incorporating the feature weighting scheme into MSRC and MRSC, we derive a second category of SR-based methods, which can dynamically reduce the negative effect of the less reliable modality. They are MSRC with feature Weighting (MSRCW) and MRSC with feature Weighting (MRSCW). Finally, we conduct experiments on two virtual multimodal databases established based on benchmark databases, including the Extended Yale B [26], AR face database [27] and the USTB ear database III [28]. Experimental results demonstrate that, under the conditions without severe degeneration, both categories of SR-based methods not only significantly outperform face and ear unimodal recognition, but also are much better than the multimodal recognition with common classifiers like NN and NFL, including those using serial concatenation and CCA-based feature fusion schemes. Moreover, the SR-based methods with feature weighting, MSRCW and MRSCW, show striking robustness to the image degeneration occurring to one of the modalities. Even when query sample of one modality suffers from 100% random pixel corruption1, they can still get the performance close to the unimodal recognition with the other modality. It should be noticed that the adaptive feature weighting scheme is not designed to settle the recognition problem when face and ear simultaneously encounter equivalent image degeneration. Therefore, one should not always expect MSRCW or MRSCW to get improvement over MSRC or MRSC, respectively. There is also a result that MRSCW and MRSC perform much better than all the other competing methods in such scenario. This can be attributed to the RSC model, which applies a sparsity-constrained regression process to detect and reduce the effects of corrupted pixels. In our study, by integrating the advantages of adaptive feature weighting and sparsity-constrained regression, MRSCW seems excellent in tackling face and ear based multimodal recognition problem. Overall, our contribution in this paper is threefold, as follows:
- (a)
To measure the reliability difference between query samples of the two modalities, a novel index called SCER is introduced, which is critical for adaptive feature fusion;
- (b)
An adaptive feature weighting scheme based on SCER is developed for dynamic fusion of face and ear features. The fusion scheme is more flexible, and is proven to be very effective by our extensive experiments;
- (c)
SR-based classification techniques are first employed in face and ear based multimodal biometric field. By combining them with adaptive feature weighting, we derive two promising SR-based multimodal methods, namely MSRCW and MRSCW.
The rest of this paper is organized as follows. We outline the related works on face and ear based multimodal biometric in Section 2. Section 3 provides a brief review of SR-based classification techniques, including the original SRC and RSC model. In Section 4, our multimodal biometric system is described in details, including an adaptive feature weighting scheme and two categories of SR-based multimodal recognition methods. Section 5 conducts experiments to evaluate the proposed methods. Finally, concluding remarks are drawn in Section 6.
Section snippets
Related works on the combination of face and ear
Both face and ear have their pros and cons when used for recognition. Face Recognition is non-intrusive, friendly, and the technology is relatively mature, however, face appearance is prone to change with expression, eyeglasses, illumination, pose, etc. Ear Recognition (ER) is also non-intrusive. Compared with the face, due to the relatively small surface and rich 3D structure, it may be more likely for hair to obscure the ear and uneven illumination to change the appearance of ear. However,
Sparse representation based classification
The original goal of sparse representation (or coding) was for representation and compression of signals, potentially using lower sampling rates than the Shannon–Nyquist bound [29]. Nevertheless, Wright et al. [13] reckoned that sparse representation is naturally discriminative and then designed a novel classification scheme, namely SRC, which was employed in FR and achieved impressive performance. Recently, many SR-based methods aiming to extend and improve SRC have been developed and
Our methodologies
With the advantage of utilizing more information for classification, pre-classification fusion methods arouse our intensive interest. Compared to sensor level fusion, feature level fusion can exploit the most discriminative information and eliminate the redundant/adverse information from the raw biometric data [4], thus it can considerably reduce the dimensionality of the fused multimodal information. Since the complexity of sparse coding basically depends on the number of dictionary atoms and
Experiments and discussions
In this section, experiments are performed to evaluate the two categories of SR-based methods. On this purpose, the common classifiers, like NN and NFL, are used for comparing. Their multimodal extensions are called Multimodal NN (MNN) and Multimodal NFL (MNFL), respectively, which also use the equally weighted concatenation of feature fusion. Besides, two multimodal methods using CCA and KCCA presented in [8], [9] are used for comparing as well, which are separately named as MCCA and MKCCA in
Conclusion
In this paper, we have presented a robust face and ear based multimodal biometric system, which can employ four proposed SR-based multimodal recognition methods, including MSRC, MRSC, MSRCW and MRSCW. Compared with MSRC and MRSC, the most important advantage of MSRCW and MRSCW is that they utilize a proposed adaptive feature weighting scheme, based on which they can effectively reduce the negative effect of the less reliable modality. In addition, to measure the reliability difference between
Conflict of interest
We declare that we have no conflict of interest.
Acknowledgment
This work is supported by the NSFC under Grants 61173182 and 61179071, as well as by findings from Sichuan Province under Grants 2011JY0124, 2012HH0004, 2012HH0031, 2012GZ0095.
Zengxi Huang received the B.E. degree from Hainan University, Haikou, China, in 2007, and the M. E. degree from Shenyang Aerospace University, Shenyang, China, in 2010. He is currently pursuing his Ph.D. degree at Sichuan University, Chengdu, China. His research interests include image processing, computer vision and pattern recognition.
References (41)
- et al.
Structural hidden Markov models for biometrics: fusion of face and fingerprint
Pattern Recognition
(2008) - et al.
Unified 3d face and ear recognition using wavelet on geometry images
Pattern Recognition
(2008) - et al.
Identity verification using speech and face information
Digital Signal Processing
(2004) - et al.
Multimodal biometrics using geometry preserving projections
Pattern Recognition
(2008) - et al.
A new method of feature fusion and its application in image recognition
Pattern Recognition
(2005) - et al.
Face and palmprint feature level fusion for single sample biometrics recognition
Neurocomputing
(2007) - et al.
Handbook of Biometrics
(2007) - et al.
Likelihood ratio-based biometric score fusion
IEEE Transactions of Pattern Analysis and Machine Intelligence
(2008) - Z. Wang, Q. Li, X. Niu, C. Busch, Multimodal biometric recognition based on complex KFDA, in: Proceedings of the5th...
- et al.
Face and ear: a bimodal identification system
Image Analysis and Recognition
(2006)
Comparison and combination of ear and face images in appearance-based biometrics
IEEE Transactions of Pattern Analysis and Machine Intelligence
Multimodal recognition using ear and face profile based on CCA
Application Research of Computers (in Chinese)
Multimodal belief fusion for face and ear biometrics
Intelligent Information Management
Robust face recognition via sparse representation
IEEE Transactions of Pattern Analysis and Machine Intelligence
Face recognition using the nearest feature line method
IEEE Transactions of Neural Networks
Cited by (67)
A face recognition taxonomy and review framework towards dimensionality, modality and feature quality
2023, Engineering Applications of Artificial IntelligenceMultimodal biometric recognition using human ear and profile face: An improved approach
2022, Machine Learning for Biometrics: Concepts, Algorithms and ApplicationsMultibiometric fusion strategy and its applications: A review
2019, Information FusionEar recognition using local binary patterns: A comparative experimental study
2019, Expert Systems with ApplicationsCitation Excerpt :Therefore, the analysis of ear images for extracting such unique and distinguishable features to identify individuals and verify their identities is an active research topic and an emerging intelligent biometric application (Emeršič, Štruc, & Peer, 2017; Galdámez, Raveane, & Arrieta, 2017; Pflug & Busch, 2012). The existing ear recognition techniques can be roughly classified depending on the type of feature extraction method into geometric (Choraś, 2005; 2008), holistic (Arbab-Zavar & Nixon, 2011; Fooprateepsiri & Kurutach, 2011; Hanmandlu & Mamta, 2013; Wang & Yuan, 2010; Yuan, Mu, Zhang, & Liu, 2006), and hybrid (Benzaoui, Kheider, & Boukrouche, 2015; Huang, Liu, Li, Yang, & Chen, 2013; Kumar & Chan, 2013; Morales, Diaz, Llinas-Sanchez, & Ferrer, 2015; Pflug, Paul, & Busch, 2014) approaches. Under each category, a variety of ear recognition techniques have been proposed in the literature.
Robust multimodal biometric system based on optimal score level fusion model
2019, Expert Systems with ApplicationsDiscriminative representation combinations for accurate face spoofing detection
2019, Pattern RecognitionCitation Excerpt :Conclusion and future work are illustrated in Section 6. Despite the multimodal methods [17,18], most face presentation attack detection methods can be divided into five categories: motion based approaches [19–22], texture based approaches [7,13,23], stereo structure based approaches [15,24], deep learning based approaches [9,25,26] and other approaches [6,27–30]. Research surveys can be found in [23,31].
Zengxi Huang received the B.E. degree from Hainan University, Haikou, China, in 2007, and the M. E. degree from Shenyang Aerospace University, Shenyang, China, in 2010. He is currently pursuing his Ph.D. degree at Sichuan University, Chengdu, China. His research interests include image processing, computer vision and pattern recognition.
Yiguang Liu received the M.S. degree in Mechanics in 1998 and Ph.D. degree in Computer Application in 2004 from Peking University and Sichuan University, respectively. Currently, he is the director of Vision and Image Processing Laboratory, and the professor of the college of computer, Sichuan University. Prior to joining Sichuan University in 2005, he had ever served as a software engineer or director in several companies such as Industrial Co., LTD of China South Communication. He was promoted as a full professor in 2006, and was chosen into the program for new century excellent talents of MOE of P. R. China in 2008. He had ever worked as a Research Fellow of National University of Singapore (2008), an academic visitor of Imperial College London under the support of Royal Academy of Engineering (2011), and a senior visiting scholar of Michigan State University. He is a reviewer for Mathematical Reviews, IEEE and ACM members. He has authored or co-authored one book and over 80 research papers published in international journals and conference proceedings. His current research interests include computer vision and image processing, pattern recognition, as well as computational intelligence.
Chunguang Li received the M.S. degree in Pattern Recognition and Intelligent Systems and the Ph.D. degree in Circuits and Systems from the University of Electronic Science and Technology of China, Chengdu, China, in 2002 and 2004, respectively. Currently, he is a Professor with the Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China. His current research interests include computational neuroscience, statistical signal processing, and machine intelligence.
Menglong Yang received the M.S. degree and Ph.D. degree in the School of Computer Science and Engineering from Sichuan University,in 2008 and 2012, respectively. From July 2010 to June 2011, he worked in Center for Biometrics and Security Research (CBSR), Institute of Automation, Chinese Academy of Sciences as a visit intern. He is currently a lecturer in the school of Aerospace Science and Engineering, Sichuan University. His research interests include computer vision, pattern recognition and transportation engineering.
Liping Chen received his M.E. degree from Northeast Agricultural University, Harbin, China, in 2004. He is currently pursuing his Ph.D. degree at Sichuan University, Chengdu, China. His research interests include image processing, pattern recognition and artificial intelligence.