Abstract
Visible and near-infrared (VIS–NIR) heterogeneous face recognition remains a challenging task due to distinctions between spectral components of two modalities and insufficiently pairwise VIS–NIR data. Inspired by the cycle-consistent generative adversarial network (CycleGAN), this paper proposes a facial feature embedded CycleGAN to translate between VIS and NIR face images, aiming to enable the distributions of translated (fake) images to be similar as those of true images. To learn the particular feature of NIR or VIS domain while preserving common facial representation between VIS and NIR domains, a facial feature extractor (FFE), tailored specifically for extracting effective feature from face images, is embedded in the generator of original CycleGAN. For implementing the FFE, we use the MobileFaceNet which is pre-trained on a VIS face database. The domain-invariant feature learning is enhanced by proposing a new pixel consistency loss. Additionally, we establish a new WHU VIS–NIR database including varies in face rotation and expressions to enrich the insufficient training data. Moreover, experiments on the well-known Oulu-CASIA NIR–VIS database and our WHU VIS–NIR database validate the potential benefit of the proposed FFE-based CycleGAN (FFE-CycleGAN). In particular, we achieve 96.5% accuracy on Oulu-CASIA and 98.9% accuracy on WHU VIS–NIR.
Similar content being viewed by others
References
Cao, B., Wang, N., Gao, X., Li, J., & Li, Z. (2019). Multi-margin based decorrelation learning for heterogeneous face recognition. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI-19 (pp. 680–686).
Chen, J., Yi, D., Yang, J., Zhao, G., Li, S.Z., & Pietikäinen, M. (2009). Learning mappings for face synthesis from near infrared to visual light images. In: 2009 IEEE conference on computer vision and pattern recognition (pp. 156–163).
Chen, S., Liu, Y., Gao, X., & Han, Z. (2018). MobileFaceNets: Efficient CNNs for accurate real-time face verification on mobile devices. CoRR abs/1804.07573.
Deng, J., Guo, J., Niannan, X., & Zafeiriou, S. (2019). Arcface: Additive angular margin loss for deep face recognition. In: CVPR (pp. 4690–4699).
Di Huang, J.S., & Wang, Y. (2012). The BUAA-VisNir face database instructions. In: Technical report.
Fu, C., Wu, X., Hu, Y., Huang, H., & He, R. (2022). Dvg-face: Dual variational generation for heterogeneous face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(6), 2938–2952.
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In: Proceedings of the 27th international conference on neural information processing systems (pp. 2672–2680).
Guo, Y., Zhang, L., Hu, Y., He, X., & Gao, J. (2016). MS-Celeb-1M: A dataset and benchmark for large-scale face recognition. In: B. Leibe, J. Matas, N. Sebe, M. Welling (Eds.), Computer Vision – ECCV (pp. 87–102).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 770–778). https://doi.org/10.1109/CVPR.2016.90.
He, R., Li, Y., Wu, X., Song, L., Chai, Z., & Wei, X. (2021). Coupled adversarial learning for semi-supervised heterogeneous face recognition. Pattern Recognition, 110, 107618.
He, R., Wu, X., Sun, Z., & Tan, T. (2017). Learning invariant deep representation for NIR–VIS face recognition. AAAI Conference on Artificial Intelligence, 4, 7.
He, R., Wu, X., Sun, Z., & Tan, T. (2019). Wasserstein CNN: Learning invariant features for NIR–VIS face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(7), 1761–1773.
Huang, G. B., Ramesh, M., Berg, T., & Learned-Miller, E. (2007). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report (pp. 07–49).
Huang, X., Lei, Z., Fan, M., Wang, X., & Li, S. Z. (2013). Regularized discriminative spectral regression method for heterogeneous face matching. IEEE Transactions on Image Processing, 22(1), 353–362.
Isola, P., Zhu, J., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5967–5976).
Jo, Y., Yang, S., & Kim, S. J. (2020). Investigating Loss Functions for Extreme Super-Resolution. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW) (pp. 1705–1712).
Juefei-Xu, F., Pal, D.K., & Savvides, M. (2015). NIR-VIS heterogeneous face recognition via cross-spectral joint dictionary learning and reconstruction. In: 2015 IEEE conference on computer vision and pattern recognition workshops (pp. 141–150).
Keinert, F., Lazzaro, D., & Morigi, S. (2019). A robust group-sparse representation variational method with applications to face recognition. IEEE Transactions on Image Processing, 28(6), 2785–2798.
Klare, B. F., & Jain, A. K. (2013). Heterogeneous face recognition using kernel prototype similarities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(6), 1410–1422.
Lei, Z., & Li, S. Z. (2009). Coupled spectral regression for matching heterogeneous faces. In: 2009 IEEE conference on computer vision and pattern recognition (pp. 1123–1128).
Lezama, J., Qiu, Q., & Sapiro, G. (2017). Not afraid of the dark: NIR–VIS face recognition via cross-spectral hallucination and low-rank embedding. In: 2017 IEEE conference on computer vision and pattern recognition (pp. 6807–6816).
Li, S.Z., Yi, D., Lei, Z., & Liao, S. (2013). The CASIA NIR–VIS 2.0 face database. In: 2013 IEEE conference on computer vision and pattern recognition workshops (pp. 348–353).
Lin, D., & Tang, X. (2006). Inter-modality face recognition. In: Proceedings of the 9th European conference on computer vision - Volume Part IV, ECCV’06 (pp. 13–26). Berlin, Heidelberg: Springer-Verlag.
Liu, X., Song, L., Wu, X., & Tan, T. (2016). Transferring deep representation for NIR-VIS heterogeneous face recognition. In: 2016 international conference on biometrics (ICB) (pp. 1–8).
Park, T., Efros, A. A., Zhang, R., & Zhu, J. Y. (2020). Contrastive learning for unpaired image-to-image translation. In: European conference on computer vision.
Peng, C., Wang, N., Li, J., & Gao, X. (2019). DLFace: Deep local descriptor for cross-modality face recognition. Pattern Recognition, 90, 161–171.
Peng, C., Wang, N., Li, J., & Gao, X. (2019). Re-ranking high-dimensional deep local representation for NIR–VIS face recognition. IEEE Transactions on Image Processing, 28(9), 4553–4565.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In N. Navab, J. Hornegger, W. M. Wells, & A. F. Frangi (Eds.), Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 (pp. 234–241). Springer International Publishing.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. (2018). MobileNetV2: Inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 4510–4520).
Schroff, F., Kalenichenko, D., & Philbin, J. (2015). FaceNet: A unified embedding for face recognition and clustering. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 815–823).
Shao, M., & Fu, Y. (2017). Cross-modality feature learning through generic hierarchical hyperlingual-words. IEEE Transactions on Neural Networks and Learning Systems, 28(2), 451–463.
Song, L., Zhang, M., Wu, X., & He, R. (2018). Adversarial discriminative heterogeneous face recognition. In: AAAI conference on artificial intelligence.
Sun, Y., Liang, D., Wang, X., & Tang, X. (2015). DeepID3: Face recognition with very deep neural networks. CoRR abs/1502.00873.
Wang, H., Zhang, H., Yu, L., Wang, L., & Yang, X. (2020). Facial feature embedded CycleGAN for Vis-Nir translation. In: IEEE international conference on acoustics, speech and signal processing (pp. 1903–1907).
Wang, R., Yang, J., Yi, D., & Li, S. Z. (2009). An analysis-by-synthesis method for heterogeneous face biometrics. In M. Tistarelli & M. S. Nixon (Eds.), Advances in biometrics (pp. 319–326). Springer.
Wu, F., Jing, X. Y., Feng, Y., mu Ji, Y., & Wang, R. (2021). Spectrum-aware discriminative deep feature learning for multi-spectral face recognition. Pattern Recognition, 111, 107632.
Wu, X., He, R., Sun, Z., & Tan, T. (2018). A light CNN for deep face representation with noisy labels. IEEE Transactions on Information Forensics and Security, 13(11), 2884–2896.
Yu, A., Haoxue, W., Huang, H., Lei, Z., & He, R. (2021). LAMP-HQ: A large-scale multi-pose high-quality database and benchmark for NIR–VIS face recognition. International Journal of Computer Vision 129.
Yu, Y. F., Dai, D. Q., Ren, C. X., & Huang, K. K. (2017). Discriminative multi-layer illumination-robust feature extraction for face recognition. Pattern Recognition, 67, 201–212.
Zhang, K., Zhang, Z., Li, Z., & Yu, Q. (2016). Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10), 1499–1503.
Zhao, G., Huang, X., Taini, M., Li, S. Z., & Pietikäinen, M. (2011). Facial expression recognition from near-infrared videos. Image and Vision Computing, 29, 607–619.
Zhu, J., Park, T., Isola, P., & Efros, A.A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE International conference on computer vision (ICCV) (pp. 2242–2251).
Zhu, J. Y., Zheng, W. S., Lu, F., & Lai, J. H. (2017). Illumination invariant single face image recognition under heterogeneous lighting condition. Pattern Recognition, 66, 313–327.
Acknowledgements
We thank Dr. Zhao et al. for offering the Oulu-CASIA NIR–VIS face expression database Zhao et al. (2011). It greatly helps us to further train and test the performance of the proposed method. Meanwhile, we give our great appreciation for those who help us to collect datasets, select pictures and finally build the WHU VIS–NIR paired face database. Besides, we would like to thank the people in Figs. 5 and 6 for their generous support.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by Hubei Provincial Natural Science Foundation of China under Grant 2022CFB084.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, H., Zhang, H., Yu, L. et al. Facial feature embedded CycleGAN for VIS–NIR translation. Multidim Syst Sign Process 34, 423–446 (2023). https://doi.org/10.1007/s11045-023-00871-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11045-023-00871-1