SFRSwin: A Shallow Significant Feature Retention Swin Transformer for Fine-Grained Image Classification of Wildlife Species

Wang, Shuai; Han, Yubing; Song, Shouliang; Zhu, Honglei; Zhang, Li; Dong, Anming; Yu, Jiguo

doi:10.1007/978-981-99-8546-3_19

Shuai Wang^15,16,
Yubing Han^15,16,
Shouliang Song^15,16,
Honglei Zhu^15,16,
Li Zhang^15,16,
Anming Dong^15,16 &
…
Jiguo Yu¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14433))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

429 Accesses

Abstract

Fine-grained image classification of wildlife species is a task of practical value and has an important role to play in the fields of endangered animal conservation, environmental protection and ecological conservation. However, the small differences between different subclasses of wildlife and the large differences within the same subclasses pose a great challenge to the classification of wildlife species. In addition, the feature extraction capability of existing methods is insufficient, ignoring the role of shallow effective features and failing to identify subtle differences between images well. To solve the above problems, this paper proposes an improved Swin Transformer architecture, called SFRSwin. Specifically, a shallow feature retention mechanism is proposed, where the mechanism consists of a branch that extracts significant features from shallow features, is used to retain important features in the shallow layers of the image, and forms a dual-stream structure with the original network. SFRSwin was trained and tested on the communal dataset Stanford Dogs and the small-scale dataset Shark species, and achieved an accuracy of 93.8\(\%\) and 84.3\(\%\) on the validation set, an improvement of 0.1\(\%\) and 0.3\(\%\) respectively over the pre-improvement period. In terms of complexity, the FLOPs only increased by 2.7\(\%\) and the number of parameters only increased by 0.15\(\%\).

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 62272256, the Shandong Provincial Natural Science Foundation under Grants ZR2021MF026 and ZR2023MF040, the Innovation Capability Enhancement Program for Small and Medium-sized Technological Enterprises of Shandong Province under Grants 2022TSGC2180 and 2022TSGC2123, the Innovation Team Cultivating Program of Jinan under Grant 202228093, and the Piloting Fundamental Research Program for the Integration of Scientific Research, Education and Industry of Qilu University of Technology (Shandong Academy of Sciences) under Grants 2021JC02014 and 2022XD001, the Talent Cultivation Promotion Program of Computer Science and Technology in Qilu University of Technology (Shandong Academy of Sciences) under Grants 2021PY05001 and 2023PY059.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Boureau, Y.L., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2559–2566. IEEE (2010)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021)
Google Scholar
Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T.: Computer vision–ECCV 2014–13th European Conference, Zurich, Switzerland, 6–12 September 2014, Proceedings, Part III. Lecture Notes in Computer Science, vol. 8694. Springer, Cham (2014)
Google Scholar
Fu, J., Zheng, H., Tao, M.: Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: IEEE Conference on Computer Vision & Pattern Recognition (2017)
Google Scholar
Han, K., et al.: A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 87–110 (2023). https://doi.org/10.1109/TPAMI.2022.3152247
Article Google Scholar
Hodgson, J.C., Baylis, S.M., Mott, R., Herrod, A., Clarke, R.H.: Precision wildlife monitoring using unmanned aerial vehicles. Sci. Rep. 6(1), 1–7 (2016)
Article Google Scholar
Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1449–1457 (2015)
Google Scholar
Liu, H., et al.: TransiFC: invariant cues-aware feature concentration learning for efficient fine-grained bird image classification. In: IEEE Transactions on Multimedia, pp. 1–14 (2023). https://doi.org/10.1109/TMM.2023.3238548
Liu, Z., et al.: Swin Transformer V2: scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12009–12019 (2022)
Google Scholar
Liu, Z., et al.: Swin Transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Qiu, C., Zhou, W.: A survey of recent advances in CNN-based fine-grained visual categorization. In: 2020 IEEE 20th International Conference on Communication Technology (ICCT), pp. 1377–1384. IEEE (2020)
Google Scholar
Shen, Z., Mu, L., Gao, J., Shi, Y., Liu, Z.: Review of fine-grained image categorization. J. Comput. Appl. 43(1), 51 (2023)
Google Scholar
Su, T., Ye, S., Song, C., Cheng, J.: Mask-Vit: an object mask embedding in vision transformer for fine-grained visual classification. In: 2022 IEEE International Conference on Image Processing (ICIP), pp. 1626–1630. IEEE (2022)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Wu, Z., et al.: Deep learning enables satellite-based monitoring of large populations of terrestrial mammals across heterogeneous landscape. Nat. Commun. 14(1), 3072 (2023)
Article Google Scholar
Zheng, M., et al.: A survey of fine-grained image categorization. In: 2018 14th IEEE International Conference on Signal Processing (ICSP), pp. 533–538 (2018). https://doi.org/10.1109/ICSP.2018.8652307

Download references

Author information

Authors and Affiliations

Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250353, China
Shuai Wang, Yubing Han, Shouliang Song, Honglei Zhu, Li Zhang & Anming Dong
School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250353, China
Shuai Wang, Yubing Han, Shouliang Song, Honglei Zhu, Li Zhang & Anming Dong
Big Data Institute, Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250353, China
Jiguo Yu

Authors

Shuai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yubing Han
View author publications
You can also search for this author in PubMed Google Scholar
Shouliang Song
View author publications
You can also search for this author in PubMed Google Scholar
Honglei Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Anming Dong
View author publications
You can also search for this author in PubMed Google Scholar
Jiguo Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yubing Han .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S. et al. (2024). SFRSwin: A Shallow Significant Feature Retention Swin Transformer for Fine-Grained Image Classification of Wildlife Species. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14433. Springer, Singapore. https://doi.org/10.1007/978-981-99-8546-3_19

Download citation

DOI: https://doi.org/10.1007/978-981-99-8546-3_19
Published: 26 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8545-6
Online ISBN: 978-981-99-8546-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SFRSwin: A Shallow Significant Feature Retention Swin Transformer for Fine-Grained Image Classification of Wildlife Species