Elsevier

Pattern Recognition

Volume 46, Issue 8, August 2013, Pages 2156-2168
Pattern Recognition

A robust face and ear based multimodal biometric system using sparse representation

https://doi.org/10.1016/j.patcog.2013.01.022Get rights and content

Abstract

If fusion rules cannot adapt to the changes of environment and individual users, multimodal systems may perform worse than unimodal systems when one or more modalities encounter data degeneration. This paper develops a robust face and ear based multimodal biometric system using Sparse Representation (SR), which integrates the face and ear at feature level, and can effectively adjust the fusion rule based on reliability difference between the modalities. We first propose a novel index called Sparse Coding Error Ratio (SCER) to measure the reliability difference between face and ear query samples. Then, SCER is utilized to develop an adaptive feature weighting scheme for dynamically reducing the negative effect of the less reliable modality. In multimodal classification phase, SR-based classification techniques are employed, i.e., Sparse Representation based Classification (SRC) and Robust Sparse Coding (RSC). Finally, we derive a category of SR-based multimodal recognition methods, including Multimodal SRC with feature Weighting (MSRCW) and Multimodal RSC with feature Weighting (MRSCW). Experimental results demonstrate that: (a) MSRCW and MRSCW perform significantly better than the unimodal recognition using either face or ear alone, as well as the known multimodal methods; (b) The effectiveness of adaptive feature weighting is verified. MSRCW and MRSCW are very robust to the image degeneration occurring to one of the modalities. Even when face (ear) query sample suffers from 100% random pixel corruption, they can still get the performance close to the ear (face) unimodal recognition; (c) By integrating the advantages of adaptive feature weighting and sparsity-constrained regression, MRSCW seems excellent in tackling the face and ear based multimodal recognition problem.

Highlights

► Introduce a new index SCER to measure the reliability difference between face and ear. ► Develop an adaptive feature weighting scheme for integrating face and ear features. ► Two multimodal methods based on sparse representation have promising robustness.

Introduction

Biometric systems relying on a single trait have to contend with a variety of practical problems like noise, non-university, upper bound on identification accuracy, spoof attacks, etc. [1]. In order to address some of the limitations and improve recognition performance, multiple sources of information are proposed to be combined together to form multimodal biometric systems, which are generally believed to be more reliable and have attracted much attention recently. In the past decade, various multimodal combinations have been reported, including face and fingerprint [2], [3], face and iris [4], fingerprint and iris [3], face and ear [5], [6], [7], [8], [9], [10], etc. Based on the type of information available in a certain module, different levels of fusion can be defined, i.e., sensor level, feature level, and match score, rank, decision levels. Sanderson and Paliwal [11] categorized the fusions performed at the former two levels into pre-classification category, and call the rest as post-classification fusion. Post-classification fusions are fairly popular due to the ease of accessing and processing the match scores, ranks and individual decisions. In contrast, combinations at early stage are relatively difficult because the raw biometric data may contain noisy or redundant data, while features extracted from different biometric traits may be incompatible. Moreover, even the multimodal system using feature level fusion does not work when one or more modalities of query samples are not available [12]. Nevertheless, because of the capability of utilizing more information for classification, pre-classification fusions have drawn more attention in recent years. Especially, feature level fusion can exploit the most discriminative information and eliminate the redundant/adverse information from the raw biometric data, and hence it is expected to provide better performance [4]. In this paper, we will focus on feature level fusion and intend to address some limitations existing in face and ear based multimodal biometric.

Most recently, Sparse Representation based Classification (SRC) techniques have been successfully applied in Face Recognition (FR), and have becoming state-of-the-art techniques in pattern recognition [13], [14], [15], [16], [17]. SRC firstly encodes the query sample with a training sample dictionary and then classifies it to the class which yields the least square coding error. SRC could be seen as a more general model than the previous nearest classifiers, like Nearest Neighbor (NN), Nearest Feature Line (NFL) [18] and Nearest Subspace (NS) [19], [20], [21], and it uses the samples from all classes to collaboratively represent the query sample to overcome the small-sample-size problem in FR [14]. In this paper, we propose to apply SRC techniques, including the original SRC [13] and Robust Sparse Coding model (RSC) [17], to face and ear based multimodal biometric. Two SR-based multimodal methods are developed, namely Multimodal SRC (MSRC) and Multimodal RSC (MRSC). In these methods, appearance-based features of face and ear are separately extracted by using Principal Component Analysis (PCA) [22], and are then directly concatenated in series.

Generally, multimodal biometric systems incorporating more evidences from various modalities can provide better performance than unimodal biometric systems [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [23], [24], [25]. However, if one or more modalities encounter data degeneration, these systems may perform worse than unimodal systems. It mainly results from the fact that most of multimodal systems are based on fixed fusion rules, or their fusion rules cannot effectively adapt to the changes of environment or individual users. In face and ear based multimodal biometric, for achieving good robustness, we propose an adaptive feature weighting scheme based on a novel index called Sparse Coding Error Ratio (SCER). SCER can effectively measure the reliability difference between face and ear query images caused by various factors, such as illumination, pose, expression, corruption, and occlusion. The intuitive motivation is that in many applications face and ear query images are less likely to suffer from degeneration simultaneously, and even if that happens, the degeneration levels are often different. By incorporating the feature weighting scheme into MSRC and MRSC, we derive a second category of SR-based methods, which can dynamically reduce the negative effect of the less reliable modality. They are MSRC with feature Weighting (MSRCW) and MRSC with feature Weighting (MRSCW). Finally, we conduct experiments on two virtual multimodal databases established based on benchmark databases, including the Extended Yale B [26], AR face database [27] and the USTB ear database III [28]. Experimental results demonstrate that, under the conditions without severe degeneration, both categories of SR-based methods not only significantly outperform face and ear unimodal recognition, but also are much better than the multimodal recognition with common classifiers like NN and NFL, including those using serial concatenation and CCA-based feature fusion schemes. Moreover, the SR-based methods with feature weighting, MSRCW and MRSCW, show striking robustness to the image degeneration occurring to one of the modalities. Even when query sample of one modality suffers from 100% random pixel corruption1, they can still get the performance close to the unimodal recognition with the other modality. It should be noticed that the adaptive feature weighting scheme is not designed to settle the recognition problem when face and ear simultaneously encounter equivalent image degeneration. Therefore, one should not always expect MSRCW or MRSCW to get improvement over MSRC or MRSC, respectively. There is also a result that MRSCW and MRSC perform much better than all the other competing methods in such scenario. This can be attributed to the RSC model, which applies a sparsity-constrained regression process to detect and reduce the effects of corrupted pixels. In our study, by integrating the advantages of adaptive feature weighting and sparsity-constrained regression, MRSCW seems excellent in tackling face and ear based multimodal recognition problem. Overall, our contribution in this paper is threefold, as follows:

  • (a)

    To measure the reliability difference between query samples of the two modalities, a novel index called SCER is introduced, which is critical for adaptive feature fusion;

  • (b)

    An adaptive feature weighting scheme based on SCER is developed for dynamic fusion of face and ear features. The fusion scheme is more flexible, and is proven to be very effective by our extensive experiments;

  • (c)

    SR-based classification techniques are first employed in face and ear based multimodal biometric field. By combining them with adaptive feature weighting, we derive two promising SR-based multimodal methods, namely MSRCW and MRSCW.

The rest of this paper is organized as follows. We outline the related works on face and ear based multimodal biometric in Section 2. Section 3 provides a brief review of SR-based classification techniques, including the original SRC and RSC model. In Section 4, our multimodal biometric system is described in details, including an adaptive feature weighting scheme and two categories of SR-based multimodal recognition methods. Section 5 conducts experiments to evaluate the proposed methods. Finally, concluding remarks are drawn in Section 6.

Section snippets

Related works on the combination of face and ear

Both face and ear have their pros and cons when used for recognition. Face Recognition is non-intrusive, friendly, and the technology is relatively mature, however, face appearance is prone to change with expression, eyeglasses, illumination, pose, etc. Ear Recognition (ER) is also non-intrusive. Compared with the face, due to the relatively small surface and rich 3D structure, it may be more likely for hair to obscure the ear and uneven illumination to change the appearance of ear. However,

Sparse representation based classification

The original goal of sparse representation (or coding) was for representation and compression of signals, potentially using lower sampling rates than the Shannon–Nyquist bound [29]. Nevertheless, Wright et al. [13] reckoned that sparse representation is naturally discriminative and then designed a novel classification scheme, namely SRC, which was employed in FR and achieved impressive performance. Recently, many SR-based methods aiming to extend and improve SRC have been developed and

Our methodologies

With the advantage of utilizing more information for classification, pre-classification fusion methods arouse our intensive interest. Compared to sensor level fusion, feature level fusion can exploit the most discriminative information and eliminate the redundant/adverse information from the raw biometric data [4], thus it can considerably reduce the dimensionality of the fused multimodal information. Since the complexity of sparse coding basically depends on the number of dictionary atoms and

Experiments and discussions

In this section, experiments are performed to evaluate the two categories of SR-based methods. On this purpose, the common classifiers, like NN and NFL, are used for comparing. Their multimodal extensions are called Multimodal NN (MNN) and Multimodal NFL (MNFL), respectively, which also use the equally weighted concatenation of feature fusion. Besides, two multimodal methods using CCA and KCCA presented in [8], [9] are used for comparing as well, which are separately named as MCCA and MKCCA in

Conclusion

In this paper, we have presented a robust face and ear based multimodal biometric system, which can employ four proposed SR-based multimodal recognition methods, including MSRC, MRSC, MSRCW and MRSCW. Compared with MSRC and MRSC, the most important advantage of MSRCW and MRSCW is that they utilize a proposed adaptive feature weighting scheme, based on which they can effectively reduce the negative effect of the less reliable modality. In addition, to measure the reliability difference between

Conflict of interest

We declare that we have no conflict of interest.

Acknowledgment

This work is supported by the NSFC under Grants 61173182 and 61179071, as well as by findings from Sichuan Province under Grants 2011JY0124, 2012HH0004, 2012HH0031, 2012GZ0095.

Zengxi Huang received the B.E. degree from Hainan University, Haikou, China, in 2007, and the M. E. degree from Shenyang Aerospace University, Shenyang, China, in 2010. He is currently pursuing his Ph.D. degree at Sichuan University, Chengdu, China. His research interests include image processing, computer vision and pattern recognition.

References (41)

  • K. Chang et al.

    Comparison and combination of ear and face images in appearance-based biometrics

    IEEE Transactions of Pattern Analysis and Machine Intelligence

    (2003)
  • X. Xu et al.

    Multimodal recognition using ear and face profile based on CCA

    Application Research of Computers (in Chinese)

    (2007)
  • X. Xu, Z. Mu, Feature fusion method based on KCCA for ear and profile face based multimodal recognition, in:...
  • D.R. Kisku et al.

    Multimodal belief fusion for face and ear biometrics

    Intelligent Information Management

    (2009)
  • J. Wright et al.

    Robust face recognition via sparse representation

    IEEE Transactions of Pattern Analysis and Machine Intelligence

    (2009)
  • M. Yang, L. Zhang, J. Yang, D. Zhang, Regularized robust coding for face recognition. arXiv: 1202.4207v2 [cs.CV],...
  • J. Huang, X. Huang, D. Metaxas, Simultaneous image transformation and sparse representation recovery, in: Proceedings...
  • Z. Zhou, A. Wagner, H. Mobahi, J. Wright, Y. Ma, Face recognition with contiguous occlusion using markov random fields,...
  • M. Yang, L. Zhang, J. Yang, D. Zhang, Robust sparse coding for face recognition, in: Proceedings of the IEEE Conference...
  • S. Li et al.

    Face recognition using the nearest feature line method

    IEEE Transactions of Neural Networks

    (1999)
  • Cited by (67)

    • Multimodal biometric recognition using human ear and profile face: An improved approach

      2022, Machine Learning for Biometrics: Concepts, Algorithms and Applications
    • Ear recognition using local binary patterns: A comparative experimental study

      2019, Expert Systems with Applications
      Citation Excerpt :

      Therefore, the analysis of ear images for extracting such unique and distinguishable features to identify individuals and verify their identities is an active research topic and an emerging intelligent biometric application (Emeršič, Štruc, & Peer, 2017; Galdámez, Raveane, & Arrieta, 2017; Pflug & Busch, 2012). The existing ear recognition techniques can be roughly classified depending on the type of feature extraction method into geometric (Choraś, 2005; 2008), holistic (Arbab-Zavar & Nixon, 2011; Fooprateepsiri & Kurutach, 2011; Hanmandlu & Mamta, 2013; Wang & Yuan, 2010; Yuan, Mu, Zhang, & Liu, 2006), and hybrid (Benzaoui, Kheider, & Boukrouche, 2015; Huang, Liu, Li, Yang, & Chen, 2013; Kumar & Chan, 2013; Morales, Diaz, Llinas-Sanchez, & Ferrer, 2015; Pflug, Paul, & Busch, 2014) approaches. Under each category, a variety of ear recognition techniques have been proposed in the literature.

    • Discriminative representation combinations for accurate face spoofing detection

      2019, Pattern Recognition
      Citation Excerpt :

      Conclusion and future work are illustrated in Section 6. Despite the multimodal methods [17,18], most face presentation attack detection methods can be divided into five categories: motion based approaches [19–22], texture based approaches [7,13,23], stereo structure based approaches [15,24], deep learning based approaches [9,25,26] and other approaches [6,27–30]. Research surveys can be found in [23,31].

    View all citing articles on Scopus

    Zengxi Huang received the B.E. degree from Hainan University, Haikou, China, in 2007, and the M. E. degree from Shenyang Aerospace University, Shenyang, China, in 2010. He is currently pursuing his Ph.D. degree at Sichuan University, Chengdu, China. His research interests include image processing, computer vision and pattern recognition.

    Yiguang Liu received the M.S. degree in Mechanics in 1998 and Ph.D. degree in Computer Application in 2004 from Peking University and Sichuan University, respectively. Currently, he is the director of Vision and Image Processing Laboratory, and the professor of the college of computer, Sichuan University. Prior to joining Sichuan University in 2005, he had ever served as a software engineer or director in several companies such as Industrial Co., LTD of China South Communication. He was promoted as a full professor in 2006, and was chosen into the program for new century excellent talents of MOE of P. R. China in 2008. He had ever worked as a Research Fellow of National University of Singapore (2008), an academic visitor of Imperial College London under the support of Royal Academy of Engineering (2011), and a senior visiting scholar of Michigan State University. He is a reviewer for Mathematical Reviews, IEEE and ACM members. He has authored or co-authored one book and over 80 research papers published in international journals and conference proceedings. His current research interests include computer vision and image processing, pattern recognition, as well as computational intelligence.

    Chunguang Li received the M.S. degree in Pattern Recognition and Intelligent Systems and the Ph.D. degree in Circuits and Systems from the University of Electronic Science and Technology of China, Chengdu, China, in 2002 and 2004, respectively. Currently, he is a Professor with the Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China. His current research interests include computational neuroscience, statistical signal processing, and machine intelligence.

    Menglong Yang received the M.S. degree and Ph.D. degree in the School of Computer Science and Engineering from Sichuan University,in 2008 and 2012, respectively. From July 2010 to June 2011, he worked in Center for Biometrics and Security Research (CBSR), Institute of Automation, Chinese Academy of Sciences as a visit intern. He is currently a lecturer in the school of Aerospace Science and Engineering, Sichuan University. His research interests include computer vision, pattern recognition and transportation engineering.

    Liping Chen received his M.E. degree from Northeast Agricultural University, Harbin, China, in 2004. He is currently pursuing his Ph.D. degree at Sichuan University, Chengdu, China. His research interests include image processing, pattern recognition and artificial intelligence.

    View full text