Seeing and hearing too: Audio representation for video captioning | IEEE Conference Publication | IEEE Xplore