Custom data layout for memory parallelism | IEEE Conference Publication | IEEE Xplore