Multimodal embedding fusion for robust speaker role recognition in video broadcast | IEEE Conference Publication | IEEE Xplore