This bugzilla service is closed. All entries have been migrated to
Bug 1591 - EigenTensor evalGemm to use MKL batched gemm if MKL on
Summary: EigenTensor evalGemm to use MKL batched gemm if MKL on
Status: NEW
Alias: None
Product: Eigen
Classification: Unclassified
Component: Tensor (show other bugs)
Version: 3.3 (current stable)
Hardware: x86 - general Linux
: Normal Feature Request
Assignee: Nobody
Depends on:
Reported: 2018-08-27 23:10 UTC by william.tambellini
Modified: 2019-12-04 17:53 UTC (History)
5 users (show)


Description william.tambellini 2018-08-27 23:10:16 UTC
As today (eigen 3.3.5 and master), according to unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h :
- evalGemv() is calling internal::general_matrix_vector_product() from the 'regular' eigen core.
- evalGemm() is not calling internal::general_matrix_matrix_product but doing a per block "manual" kernel base block product (gebp) 

This is consequently not allowing to use MKL for Tensor mat-mat multiplication : building Eigen Tensor with MKL (EIGEN_USE_MKL_ALL) does nt seem to change anything at least for Tensor mat*mat products.

That feature request is to allow to use MKL batched gemm product via eigenTensor for 3+d matmat products.

Comment 1 william.tambellini 2018-09-05 00:36:48 UTC
Batched gemm using MKL seems to be implemented in TensorFlow in :
method MklCblasGemmBatch()
class BatchMatMulMkl : public OpKernel 

Would someone know why it has not been implemented directly in EigenTensor ?
Comment 2 Christoph Hertzberg 2018-09-05 16:08:37 UTC
I don't know the details of the Tensor internals, but it looks like it is not easily possible to simply call `internal::general_matrix_matrix_product`, because the TensorContraction would need to call a matrix-matrix product with inner strides (in some cases). Maybe it is worth falling back to a GEMM call, if inner strides allow this.

I also don't know if we want to have a batched GEMM implementation in Eigen -- I guess it could reduce overhead if lots of same-sized products are to be evaluated (and it could exploit multi-threading even for smaller matrices).

Does TensorFlow actually use batched GEMM for contraction? You just referred to the place where they wrap the corresponding MKL function.
Comment 3 william.tambellini 2018-09-14 23:07:19 UTC
Hi Cristoph
- calling internal::general_matrix_matrix_product in tensor evalGemm would nt anyway take advantage of mkl batched matmul because I suppose general_matrix_matrix_product does nt handle multiples matmul at a time, and does nt seem to call cblas_?gemm_batch.   

- I dont propose to have a general generic batched gemm in the full Eigen. I m just thinking to simply take advantage of MKL in TensorContraction evalGemm. It could be similar to evalGemmXSMM(), could be called evalGemmMKL(...), capable to use MKL batched gemm. 
Would limiting the implementation to TensorContraction be acceptable (no change to Eigen Core)?

- the perf gain of mkl batched gemm over non batched is given here :

- TF, when built with MKL, seems to use batched GEMM if the operator (BatchMatMulMkl) is ofcourse used/called : 
// This file uses MKL CBLAS batched xGEMM for acceleration of TF Batch
// Matrix-Matrix Multiplication (MatMul) operations.
// We currently register this kernel only for MKL supported data
// types (float, double, complex64, complex128). The macro INTEL_MKL is defined
// by the build system only when MKL is chosen as an option at configure stage
// and when it is undefined at build time, this file becomes an empty
// compilation unit

Comment 4 Nobody 2019-12-04 17:53:26 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance:

Note You need to log in before you can comment on or make changes to this bug.