Skip to main content
Log in

Facial feature embedded CycleGAN for VIS–NIR translation

  • Published:
Multidimensional Systems and Signal Processing Aims and scope Submit manuscript

Abstract

Visible and near-infrared (VIS–NIR) heterogeneous face recognition remains a challenging task due to distinctions between spectral components of two modalities and insufficiently pairwise VIS–NIR data. Inspired by the cycle-consistent generative adversarial network (CycleGAN), this paper proposes a facial feature embedded CycleGAN to translate between VIS and NIR face images, aiming to enable the distributions of translated (fake) images to be similar as those of true images. To learn the particular feature of NIR or VIS domain while preserving common facial representation between VIS and NIR domains, a facial feature extractor (FFE), tailored specifically for extracting effective feature from face images, is embedded in the generator of original CycleGAN. For implementing the FFE, we use the MobileFaceNet which is pre-trained on a VIS face database. The domain-invariant feature learning is enhanced by proposing a new pixel consistency loss. Additionally, we establish a new WHU VIS–NIR database including varies in face rotation and expressions to enrich the insufficient training data. Moreover, experiments on the well-known Oulu-CASIA NIR–VIS database and our WHU VIS–NIR database validate the potential benefit of the proposed FFE-based CycleGAN (FFE-CycleGAN). In particular, we achieve 96.5% accuracy on Oulu-CASIA and 98.9% accuracy on WHU VIS–NIR.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Cao, B., Wang, N., Gao, X., Li, J., & Li, Z. (2019). Multi-margin based decorrelation learning for heterogeneous face recognition. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI-19 (pp. 680–686).

  • Chen, J., Yi, D., Yang, J., Zhao, G., Li, S.Z., & Pietikäinen, M. (2009). Learning mappings for face synthesis from near infrared to visual light images. In: 2009 IEEE conference on computer vision and pattern recognition (pp. 156–163).

  • Chen, S., Liu, Y., Gao, X., & Han, Z. (2018). MobileFaceNets: Efficient CNNs for accurate real-time face verification on mobile devices. CoRR abs/1804.07573.

  • Deng, J., Guo, J., Niannan, X., & Zafeiriou, S. (2019). Arcface: Additive angular margin loss for deep face recognition. In: CVPR (pp. 4690–4699).

  • Di Huang, J.S., & Wang, Y. (2012). The BUAA-VisNir face database instructions. In: Technical report.

  • Fu, C., Wu, X., Hu, Y., Huang, H., & He, R. (2022). Dvg-face: Dual variational generation for heterogeneous face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(6), 2938–2952.

    Article  Google Scholar 

  • Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In: Proceedings of the 27th international conference on neural information processing systems (pp. 2672–2680).

  • Guo, Y., Zhang, L., Hu, Y., He, X., & Gao, J. (2016). MS-Celeb-1M: A dataset and benchmark for large-scale face recognition. In: B. Leibe, J. Matas, N. Sebe, M. Welling (Eds.), Computer Vision – ECCV (pp. 87–102).

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 770–778). https://doi.org/10.1109/CVPR.2016.90.

  • He, R., Li, Y., Wu, X., Song, L., Chai, Z., & Wei, X. (2021). Coupled adversarial learning for semi-supervised heterogeneous face recognition. Pattern Recognition, 110, 107618.

    Article  Google Scholar 

  • He, R., Wu, X., Sun, Z., & Tan, T. (2017). Learning invariant deep representation for NIR–VIS face recognition. AAAI Conference on Artificial Intelligence, 4, 7.

    Google Scholar 

  • He, R., Wu, X., Sun, Z., & Tan, T. (2019). Wasserstein CNN: Learning invariant features for NIR–VIS face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(7), 1761–1773.

    Article  Google Scholar 

  • Huang, G. B., Ramesh, M., Berg, T., & Learned-Miller, E. (2007). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report (pp. 07–49).

  • Huang, X., Lei, Z., Fan, M., Wang, X., & Li, S. Z. (2013). Regularized discriminative spectral regression method for heterogeneous face matching. IEEE Transactions on Image Processing, 22(1), 353–362.

    Article  MathSciNet  MATH  Google Scholar 

  • Isola, P., Zhu, J., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5967–5976).

  • Jo, Y., Yang, S., & Kim, S. J. (2020). Investigating Loss Functions for Extreme Super-Resolution. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW) (pp. 1705–1712).

  • Juefei-Xu, F., Pal, D.K., & Savvides, M. (2015). NIR-VIS heterogeneous face recognition via cross-spectral joint dictionary learning and reconstruction. In: 2015 IEEE conference on computer vision and pattern recognition workshops (pp. 141–150).

  • Keinert, F., Lazzaro, D., & Morigi, S. (2019). A robust group-sparse representation variational method with applications to face recognition. IEEE Transactions on Image Processing, 28(6), 2785–2798.

    Article  MathSciNet  MATH  Google Scholar 

  • Klare, B. F., & Jain, A. K. (2013). Heterogeneous face recognition using kernel prototype similarities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(6), 1410–1422.

    Article  Google Scholar 

  • Lei, Z., & Li, S. Z. (2009). Coupled spectral regression for matching heterogeneous faces. In: 2009 IEEE conference on computer vision and pattern recognition (pp. 1123–1128).

  • Lezama, J., Qiu, Q., & Sapiro, G. (2017). Not afraid of the dark: NIR–VIS face recognition via cross-spectral hallucination and low-rank embedding. In: 2017 IEEE conference on computer vision and pattern recognition (pp. 6807–6816).

  • Li, S.Z., Yi, D., Lei, Z., & Liao, S. (2013). The CASIA NIR–VIS 2.0 face database. In: 2013 IEEE conference on computer vision and pattern recognition workshops (pp. 348–353).

  • Lin, D., & Tang, X. (2006). Inter-modality face recognition. In: Proceedings of the 9th European conference on computer vision - Volume Part IV, ECCV’06 (pp. 13–26). Berlin, Heidelberg: Springer-Verlag.

  • Liu, X., Song, L., Wu, X., & Tan, T. (2016). Transferring deep representation for NIR-VIS heterogeneous face recognition. In: 2016 international conference on biometrics (ICB) (pp. 1–8).

  • Park, T., Efros, A. A., Zhang, R., & Zhu, J. Y. (2020). Contrastive learning for unpaired image-to-image translation. In: European conference on computer vision.

  • Peng, C., Wang, N., Li, J., & Gao, X. (2019). DLFace: Deep local descriptor for cross-modality face recognition. Pattern Recognition, 90, 161–171.

    Article  Google Scholar 

  • Peng, C., Wang, N., Li, J., & Gao, X. (2019). Re-ranking high-dimensional deep local representation for NIR–VIS face recognition. IEEE Transactions on Image Processing, 28(9), 4553–4565.

    Article  MathSciNet  MATH  Google Scholar 

  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In N. Navab, J. Hornegger, W. M. Wells, & A. F. Frangi (Eds.), Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 (pp. 234–241). Springer International Publishing.

    Chapter  Google Scholar 

  • Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. (2018). MobileNetV2: Inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 4510–4520).

  • Schroff, F., Kalenichenko, D., & Philbin, J. (2015). FaceNet: A unified embedding for face recognition and clustering. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 815–823).

  • Shao, M., & Fu, Y. (2017). Cross-modality feature learning through generic hierarchical hyperlingual-words. IEEE Transactions on Neural Networks and Learning Systems, 28(2), 451–463.

    Article  MathSciNet  Google Scholar 

  • Song, L., Zhang, M., Wu, X., & He, R. (2018). Adversarial discriminative heterogeneous face recognition. In: AAAI conference on artificial intelligence.

  • Sun, Y., Liang, D., Wang, X., & Tang, X. (2015). DeepID3: Face recognition with very deep neural networks. CoRR abs/1502.00873.

  • Wang, H., Zhang, H., Yu, L., Wang, L., & Yang, X. (2020). Facial feature embedded CycleGAN for Vis-Nir translation. In: IEEE international conference on acoustics, speech and signal processing (pp. 1903–1907).

  • Wang, R., Yang, J., Yi, D., & Li, S. Z. (2009). An analysis-by-synthesis method for heterogeneous face biometrics. In M. Tistarelli & M. S. Nixon (Eds.), Advances in biometrics (pp. 319–326). Springer.

    Chapter  Google Scholar 

  • Wu, F., Jing, X. Y., Feng, Y., mu Ji, Y., & Wang, R. (2021). Spectrum-aware discriminative deep feature learning for multi-spectral face recognition. Pattern Recognition, 111, 107632.

    Article  Google Scholar 

  • Wu, X., He, R., Sun, Z., & Tan, T. (2018). A light CNN for deep face representation with noisy labels. IEEE Transactions on Information Forensics and Security, 13(11), 2884–2896.

    Article  Google Scholar 

  • Yu, A., Haoxue, W., Huang, H., Lei, Z., & He, R. (2021). LAMP-HQ: A large-scale multi-pose high-quality database and benchmark for NIR–VIS face recognition. International Journal of Computer Vision 129.

  • Yu, Y. F., Dai, D. Q., Ren, C. X., & Huang, K. K. (2017). Discriminative multi-layer illumination-robust feature extraction for face recognition. Pattern Recognition, 67, 201–212.

    Article  Google Scholar 

  • Zhang, K., Zhang, Z., Li, Z., & Yu, Q. (2016). Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10), 1499–1503.

    Article  Google Scholar 

  • Zhao, G., Huang, X., Taini, M., Li, S. Z., & Pietikäinen, M. (2011). Facial expression recognition from near-infrared videos. Image and Vision Computing, 29, 607–619.

    Article  Google Scholar 

  • Zhu, J., Park, T., Isola, P., & Efros, A.A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE International conference on computer vision (ICCV) (pp. 2242–2251).

  • Zhu, J. Y., Zheng, W. S., Lu, F., & Lai, J. H. (2017). Illumination invariant single face image recognition under heterogeneous lighting condition. Pattern Recognition, 66, 313–327.

    Article  Google Scholar 

Download references

Acknowledgements

We thank Dr. Zhao et al. for offering the Oulu-CASIA NIR–VIS face expression database Zhao et al. (2011). It greatly helps us to further train and test the performance of the proposed method. Meanwhile, we give our great appreciation for those who help us to collect datasets, select pictures and finally build the WHU VIS–NIR paired face database. Besides, we would like to thank the people in Figs. 5 and 6 for their generous support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haijian Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by Hubei Provincial Natural Science Foundation of China under Grant 2022CFB084.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, H., Zhang, H., Yu, L. et al. Facial feature embedded CycleGAN for VIS–NIR translation. Multidim Syst Sign Process 34, 423–446 (2023). https://doi.org/10.1007/s11045-023-00871-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11045-023-00871-1

Keywords

Navigation