As today (eigen 3.3.5 and master), according to unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h :
- evalGemv() is calling internal::general_matrix_vector_product() from the 'regular' eigen core.
- evalGemm() is not calling internal::general_matrix_matrix_product but doing a per block "manual" kernel base block product (gebp)
This is consequently not allowing to use MKL for Tensor mat-mat multiplication : building Eigen Tensor with MKL (EIGEN_USE_MKL_ALL) does nt seem to change anything at least for Tensor mat*mat products.
That feature request is to allow to use MKL batched gemm product via eigenTensor for 3+d matmat products.
Batched gemm using MKL seems to be implemented in TensorFlow in :
class BatchMatMulMkl : public OpKernel
Would someone know why it has not been implemented directly in EigenTensor ?
I don't know the details of the Tensor internals, but it looks like it is not easily possible to simply call `internal::general_matrix_matrix_product`, because the TensorContraction would need to call a matrix-matrix product with inner strides (in some cases). Maybe it is worth falling back to a GEMM call, if inner strides allow this.
I also don't know if we want to have a batched GEMM implementation in Eigen -- I guess it could reduce overhead if lots of same-sized products are to be evaluated (and it could exploit multi-threading even for smaller matrices).
Does TensorFlow actually use batched GEMM for contraction? You just referred to the place where they wrap the corresponding MKL function.
- calling internal::general_matrix_matrix_product in tensor evalGemm would nt anyway take advantage of mkl batched matmul because I suppose general_matrix_matrix_product does nt handle multiples matmul at a time, and does nt seem to call cblas_?gemm_batch.
- I dont propose to have a general generic batched gemm in the full Eigen. I m just thinking to simply take advantage of MKL in TensorContraction evalGemm. It could be similar to evalGemmXSMM(), could be called evalGemmMKL(...), capable to use MKL batched gemm.
Would limiting the implementation to TensorContraction be acceptable (no change to Eigen Core)?
- the perf gain of mkl batched gemm over non batched is given here :
- TF, when built with MKL, seems to use batched GEMM if the operator (BatchMatMulMkl) is ofcourse used/called :
// This file uses MKL CBLAS batched xGEMM for acceleration of TF Batch
// Matrix-Matrix Multiplication (MatMul) operations.
// We currently register this kernel only for MKL supported data
// types (float, double, complex64, complex128). The macro INTEL_MKL is defined
// by the build system only when MKL is chosen as an option at configure stage
// and when it is undefined at build time, this file becomes an empty
// compilation unit
-- GitLab Migration Automatic Message --
This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/1591.