Abstract
Fine-grained image classification of wildlife species is a task of practical value and has an important role to play in the fields of endangered animal conservation, environmental protection and ecological conservation. However, the small differences between different subclasses of wildlife and the large differences within the same subclasses pose a great challenge to the classification of wildlife species. In addition, the feature extraction capability of existing methods is insufficient, ignoring the role of shallow effective features and failing to identify subtle differences between images well. To solve the above problems, this paper proposes an improved Swin Transformer architecture, called SFRSwin. Specifically, a shallow feature retention mechanism is proposed, where the mechanism consists of a branch that extracts significant features from shallow features, is used to retain important features in the shallow layers of the image, and forms a dual-stream structure with the original network. SFRSwin was trained and tested on the communal dataset Stanford Dogs and the small-scale dataset Shark species, and achieved an accuracy of 93.8\(\%\) and 84.3\(\%\) on the validation set, an improvement of 0.1\(\%\) and 0.3\(\%\) respectively over the pre-improvement period. In terms of complexity, the FLOPs only increased by 2.7\(\%\) and the number of parameters only increased by 0.15\(\%\).
This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 62272256, the Shandong Provincial Natural Science Foundation under Grants ZR2021MF026 and ZR2023MF040, the Innovation Capability Enhancement Program for Small and Medium-sized Technological Enterprises of Shandong Province under Grants 2022TSGC2180 and 2022TSGC2123, the Innovation Team Cultivating Program of Jinan under Grant 202228093, and the Piloting Fundamental Research Program for the Integration of Scientific Research, Education and Industry of Qilu University of Technology (Shandong Academy of Sciences) under Grants 2021JC02014 and 2022XD001, the Talent Cultivation Promotion Program of Computer Science and Technology in Qilu University of Technology (Shandong Academy of Sciences) under Grants 2021PY05001 and 2023PY059.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Boureau, Y.L., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2559–2566. IEEE (2010)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021)
Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T.: Computer vision–ECCV 2014–13th European Conference, Zurich, Switzerland, 6–12 September 2014, Proceedings, Part III. Lecture Notes in Computer Science, vol. 8694. Springer, Cham (2014)
Fu, J., Zheng, H., Tao, M.: Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: IEEE Conference on Computer Vision & Pattern Recognition (2017)
Han, K., et al.: A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 87–110 (2023). https://doi.org/10.1109/TPAMI.2022.3152247
Hodgson, J.C., Baylis, S.M., Mott, R., Herrod, A., Clarke, R.H.: Precision wildlife monitoring using unmanned aerial vehicles. Sci. Rep. 6(1), 1–7 (2016)
Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1449–1457 (2015)
Liu, H., et al.: TransiFC: invariant cues-aware feature concentration learning for efficient fine-grained bird image classification. In: IEEE Transactions on Multimedia, pp. 1–14 (2023). https://doi.org/10.1109/TMM.2023.3238548
Liu, Z., et al.: Swin Transformer V2: scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12009–12019 (2022)
Liu, Z., et al.: Swin Transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Qiu, C., Zhou, W.: A survey of recent advances in CNN-based fine-grained visual categorization. In: 2020 IEEE 20th International Conference on Communication Technology (ICCT), pp. 1377–1384. IEEE (2020)
Shen, Z., Mu, L., Gao, J., Shi, Y., Liu, Z.: Review of fine-grained image categorization. J. Comput. Appl. 43(1), 51 (2023)
Su, T., Ye, S., Song, C., Cheng, J.: Mask-Vit: an object mask embedding in vision transformer for fine-grained visual classification. In: 2022 IEEE International Conference on Image Processing (ICIP), pp. 1626–1630. IEEE (2022)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Wu, Z., et al.: Deep learning enables satellite-based monitoring of large populations of terrestrial mammals across heterogeneous landscape. Nat. Commun. 14(1), 3072 (2023)
Zheng, M., et al.: A survey of fine-grained image categorization. In: 2018 14th IEEE International Conference on Signal Processing (ICSP), pp. 533–538 (2018). https://doi.org/10.1109/ICSP.2018.8652307
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, S. et al. (2024). SFRSwin: A Shallow Significant Feature Retention Swin Transformer for Fine-Grained Image Classification of Wildlife Species. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14433. Springer, Singapore. https://doi.org/10.1007/978-981-99-8546-3_19
Download citation
DOI: https://doi.org/10.1007/978-981-99-8546-3_19
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8545-6
Online ISBN: 978-981-99-8546-3
eBook Packages: Computer ScienceComputer Science (R0)