Multimodal active speaker detection using cross-attention and contextual information | IEEE Conference Publication | IEEE Xplore