Skip to main content
Log in

Learning Robust Facial Representation From the View of Diversity and Closeness

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Recent years have witnessed remarkable progress in deep face recognition due to the advancement of both deep convolutional neural networks and loss functions. In this work, we provide an intrinsic analysis to reveal the working mechanism of softmax from the view of closeness and diversity. We find that enhancing the closeness of easy samples and preserving the diversity of hard samples can improve feature representation robustness. However, most of the previous works aim to improve the closeness of intraclass samples and fail to emphasize hard sample diversity. To solve the above issue, we developed a novel robust feature representation model, which leverages the rate-distortion theory to characterize the proportions of closeness and diversity, in conjunction with the designed hard sample mining scheme to further enhance the discriminative ability of the deep model. Specifically, the proposed model compresses the coding rate of easy samples for closeness, and expands the coding rate of hard samples for diversity. A novel hard sample mining scheme is designed to ensure that easy samples and hard samples are balanced in each batch. For each batch, we also guarantee that hard samples are both from the intraclass samples led by various noises and from the interclass samples with similar appearances. Extensive experimental results on popular benchmarks demonstrate the superiority of our proposed approach over state-of-the-art competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Algorithm 2
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author upon reasonable request.

References

  • Chan, K. H. R., Yu, Y., You, C., Qi, H., Wright, J., & Ma, Y. (2022). Redunet: A white-box deep network from the principle of maximizing rate reduction. Journal of Machine Learning Research, 23(114), 1–103.

    MathSciNet  Google Scholar 

  • Cheng, Z., Zhu, X., & Gong, S. (2018). Low-resolution face recognition. In: Asian conference on computer vision (pp. 605–621). Springer.

  • Chopra, S., Hadsell, R. & LeCun, Y. (2005). Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05) (pp. 539–546). IEEE.

  • Deng, J., Guo, J., Xue, N., & Zafeiriou, S. (2019). Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4690–4699).

  • Deng, J., Guo, J., Yang, J., Lattas, A., & Zafeiriou, S. (2021). Variational prototype learning for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11,906–11,915).

  • Dong, X., Zheng, L., Ma, F., Yang, Y., & Meng, D. (2018). Few-example object detection with model communication. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(7), 1641–1654.

    Article  Google Scholar 

  • Ge, J., Gao, G., & Liu, Z. (2019). Visual-textual association with hardest and semi-hard negative pairs mining for person search. arXiv:1912.03083.

  • Guo, Y., Zhang, L., Hu, Y., He, X., & Gao, J. (2016). Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In: European conference on computer vision (pp. 87–102). Springer.

  • Habibian, A., Rozendaal, T., Tomczak, J. M., & Cohen, T. S. (2019). Video compression with rate-distortion autoencoders. In: Proceedings of the IEEE/CVF international conference on computer vision (pp. 7033–7042).

  • Harwood, B., Kumar, V. B. G., Carneiro, G., Reid, I., & Drummond, T. (2017). Smart mining for deep metric learning. In: Proceedings of the IEEE international conference on computer vision (pp. 2821–2829).

  • He, L., Wang, Z., Li, Y., & Wang, S. (2020). Softmax dissection: Towards understanding intra-and inter-class objective for embedding learning. In: Proceedings of the AAAI conference on artificial intelligence (pp. 10,957–10,964).

  • Huang, G. B., Mattar, M., Berg, T., & Learned-Miller, E. (2008). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. In: Workshop on faces in ’Real-Life’ images: Detection, alignment, and recognition.

  • Huang, Y., Wang, Y., Tai, Y., Liu, X., Shen, P., Li, S., Li, J., & Huang, F. (2020). Curricularface: Adaptive curriculum learning loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5901–5910).

  • Kemelmacher-Shlizerman, I., Seitz, S.M., Miller, D., & Brossard, E. (2016). The megaface benchmark: 1 million faces for recognition at scale. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4873–4882).

  • Kim, M., Jain, A.K. & Liu, X. (2022). Adaface: Quality adaptive margin for face recognition. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 18,729–18,738).

  • Li, B., Xi, T., Zhang, G., Feng, H., Han, J., Liu, J., Ding, E., & Liu, W.(2021). Dynamic class queue for large scale face recognition in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 3763–3772).

  • Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision (pp. 2980–2988).

  • Liu, J., Qin, H., Wu, Y., & Liang, D. (2022). Anchorface: Boosting tar@ far for practical face recognition. In: Proceedings of the AAAI conference on artificial intelligence (pp. 1711–1719).

  • Liu, W., Wen, Y., Yu, Z., & Yang, M. (2016). Large-margin softmax loss for convolutional neural networks. In: ICML (p. 7).

  • Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., & Song, L. (2017). Sphereface: Deep hypersphere embedding for face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 212–220).

  • Liu, H., Zhu, X., Lei, Z., & Li, S. Z.(2019). Adaptiveface: Adaptive margin and sampling for face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11,947–11,956).

  • Loshchilov, I. & Hutter, F. (2015). Online batch selection for faster training of neural networks. arXiv:1511.06343.

  • MacDonald, J., Wäldchen, S., Hauch, S., & Kutyniok, G. (2019). A rate-distortion framework for explaining neural network decisions. arXiv:1905.11092.

  • Ma, Y., Derksen, H., & Hong, W. (2007). Segmentation of multivariate mixed data via lossy data coding and compression. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(9), 1546–1562.

    Article  Google Scholar 

  • Maze, B., Adams, J., Duncan, J. A., Kalka, N., Miller, T., Otto, C., Jain, A. K., Niggel, W. T., Anderson, J., Cheney, J., & Grother, P. (2018). Iarpa janus benchmark-c: Face dataset and protocol. In: 2018 International conference on biometrics (ICB). (pp 158–165). IEEE.

  • Meng, Q., Zhao, S., Huang, Z., & Zhou, F. (2021). Magface: A universal representation for face recognition and quality assessment. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14,225–14,234).

  • Moschoglou, S., Papaioannou, A., Sagonas, C., Deng, J., Kotsia, I., & Zafeiriou, S. (2017). Agedb: The first manually collected, in-the-wild age database. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 51–59).

  • Ng, H. W. & Winkler, S. (2014). A data-driven approach to cleaning large face datasets. In: 2014 IEEE international conference on image processing (ICIP) (pp. 343–347). IEEE.

  • Parkhi, O. M., Vedaldi, A. & Zisserman, A. (2015). Deep face recognition.

  • Ranjan, R., Castillo, C. D. & Chellappa, R. (2017). L2-constrained softmax loss for discriminative face verification. arXiv:1703.09507.

  • Schroff, F., Kalenichenko, D. & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 815–823).

  • Sengupta, S., Chen, J. C., Castillo, C., Patel, V. M., Chellappa, R., & Jacobs, D. W. (2016). Frontal to profile face verification in the wild. In: 2016 IEEE winter conference on applications of computer vision (WACV) (pp. 1–9). IEEE.

  • Shrivastava, A., Gupta, A. & Girshick, R. (2016). Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 761–769).

  • Simo-Serra, E., Trulls, E., Ferraz, L., Kokkinos, I., & Moreno-Noguer, F. (2014). Fracking deep convolutional image descriptors. arXiv:1412.6537.

  • Smirnov, E., Melnikov, A., Novoselov, S., Luckyanets, E., & Lavrentyeva, G. (2017). Doppelganger mining for face representation learning. In: Proceedings of the IEEE international conference on computer vision workshops (pp. 1916–1923).

  • Sohn, K. (2016). Improved deep metric learning with multi-class n-pair loss objective. In: Advances in neural information processing systems (pp. 1857–1865).

  • Sun, Y., Cheng, C., Zhang, Y., Zhang, C., Zheng, L., Wang, Z., & Wei, Y. (2020). Circle loss: A unified perspective of pair similarity optimization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6398–6407).

  • Sun, Y., Liang, D., Wang, X., & Tang, X. (2015). Deepid3: Face recognition with very deep neural networks. arXiv:1502.00873.

  • Sun, Y. (2015). Deep learning face representation by joint identification-verification. Hong Kong: The Chinese University of Hong Kong.

    Google Scholar 

  • Taigman, Y., Yang, M., Ranzato, M. A., & Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1701–1708).

  • Tanenbaum, A. S. (2001). Modern operating systems prentice hall. Upper Saddle River, NJ: Inc.

  • Thomas, M., & Joy, A. T. (2006). Elements of information theory. Hoboken: Wiley.

    Google Scholar 

  • Triantafyllidou, D., Nousi, P., & Tefas, A. (2018). Fast deep convolutional face detection in the wild exploiting hard sample mining. Big data research, 11, 65–76.

    Article  Google Scholar 

  • Wang, H., Wang, Y., Zhou, Z., Ji, X., Gong, D., Zhou, J., Li, Z., & Liu, W., (2018b). Cosface: Large margin cosine loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5265–5274).

  • Wang, C., Zhang, X. & Lan, X. (2017). How to train triplet networks with 100k identities? In: Proceedings of the IEEE international conference on computer vision workshops (pp. 1907–1915).

  • Wang, X., Zhang, S., Wang, S., Fu, T., Shi, H., & Mei, T. (2020). Mis-classified vector guided softmax loss for face recognition. In: Proceedings of the AAAI conference on artificial intelligence (pp. 12,241–12,248).

  • Wang, F., Cheng, J., Liu, W., & Liu, H. (2018a). Additive margin softmax for face verification. IEEE Signal Processing Letters, 25(7), 926–930.

    Article  Google Scholar 

  • Whitelam, C., Taborsky, E., Blanton, A., Maze, B., Adams, J., Miller, T., Kalka, N., Jain, A.K., Duncan, J.A., Allen, K., & Cheney, J. (2017). Iarpa janus benchmark-b face dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 90–98).

  • Xuan, H., Stylianou, A., Liu, X., Pless, R. (2020). Hard negative examples are hard, but useful. In: European conference on computer vision (pp. 126–142). Springer.

  • Yu, B., Liu, T., Gong, M., Ding, C., & Tao, D. (2018). Correcting the triplet selection bias for triplet loss. In: Proceedings of the European conference on computer vision (ECCV) (pp. 71–87).

  • Yu, Y., Chan, K. H. R., You, C., Song, C., & Ma, Y. (2020). Learning diverse and discriminative representations via the principle of maximal coding rate reduction. Advances in Neural Information Processing Systems, 33, 9422–9434.

  • Zhang, X., Zhao, R., Qiao, Y., Wang, X.. & Li, H. (2019). Adacos: Adaptively scaling cosine logits for effectively learning deep face representations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10,823–10,832).

  • Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10), 1499–1503.

  • Zhao, C., Qian, J., Zhu, S., Xie, J., & Yang, J. (2022). Emphasizing closeness and diversity simultaneously for deep face representation. In: Asian conference on computer vision (pp. 605–621). Springer.

  • Zheng, T., & Deng, W. (2018). Cross-pose lfw: A database for studying cross-pose face recognition in unconstrained environments. Technical Reports (Vol. 5, p. 7). Beijing University of Posts and Telecommunications.

  • Zheng, T., Deng, W. & Hu, J. (2017). Cross-age lfw: A database for studying cross-age face recognition in unconstrained environments. arXiv:1708.08197.

  • Zhong, Y., Deng, W., Hu, J., Zhao, D., Li, X., & Wen, D. (2021). Sface: Sigmoid-constrained hypersphere loss for robust face recognition. IEEE Transactions on Image Processing, 30, 2587–2598.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianjun Qian.

Additional information

Communicated by Wanli Ouyang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, C., Qian, J., Zhu, S. et al. Learning Robust Facial Representation From the View of Diversity and Closeness. Int J Comput Vis 132, 410–427 (2024). https://doi.org/10.1007/s11263-023-01893-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01893-9

Keywords

Navigation