GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark
Algorithm 784: GEMM-based level 3 BLAS: portability and optimization issues
ACM Transactions on Mathematical Software
Bo Kågström
Per Ling
An updated set of basic linear algebra subprograms (BLAS)