Enhancing Fine-Grained 3D Object Recognition Using Hybrid Multi-Modal Vision Transformer-CNN Models | IEEE Conference Publication | IEEE Xplore