Fast-Slow Transformer for Visually Grounding Speech | IEEE Conference Publication | IEEE Xplore