Auto-tuning Dense Matrix Multiplication for GPGPU with Cache | IEEE Conference Publication | IEEE Xplore