Multi-Task Learning of Hierarchical Vision-Language Representation | IEEE Conference Publication | IEEE Xplore