Created attachment 919 [details]
Benchmark for matrix multiplication
Attached is a benchmark which works slower with Eigen 3.3 than with Eigen 3.2. Performance degradation is observed at least for MSVC (~2-3 times slower) and Intel Compiler (~1.5-3 times slower).
It seems that matrix-vector or vector-vector multiplication inside for cycle is worse optimized by compiler since version 3.3 (less functions are inlined).
Processor: Intel(R) Core(TM) i7-7700 CPU @ 3.6GHz 3.60 GHz
Memory (RAM): 16.0 GB
Windows 10 x64
Visual Studio 2017 with MSVC compiler command line:
/GS /W1 /Zc:wchar_t /I"C:\Users\lgrechishnikov\source\repos\TestEigenPerformance\TestEigenPerformance" /Zi /Gm- /O2 /Fd"x64\Release\vc141.pdb" /Zc:inline /fp:precise /errorReport:prompt /WX- /Zc:forScope /Gd /MD /FC /Fa"x64\Release\" /EHsc /nologo /Fo"x64\Release\" /Fp"x64\Release\TestEigenPerformance.pch" /diagnostics:classic
Visual Studio 2017 with Intel Compiler 2018 command line:
/GS /W1 /Zc:wchar_t /I"C:\Users\lgrechishnikov\source\repos\TestEigenPerformance\TestEigenPerformance" /Zi /O2 /Fd"x64\Release\vc141.pdb" /fp:precise /Zc:forScope /MD /FC /Fa"x64\Release\" /EHsc /nologo /Fo"x64\Release\" /Qprof-dir "x64\Release\" /Fp"x64\Release\TestEigenPerformance.pch"
MSVC, Eigen 3.2.4 - 5.0 ms
MSVC, Eigen 3.3.7 - 16.2 ms
Intel Compiler, Eigen 3.2.4 - 4.52 ms
Intel Compiler, Eigen 3.3.7 - 17.17 ms
Indeed both MSVC and ICC struggle to properly inline some functions. Of course, no problem with clang and gcc: https://godbolt.org/z/QlbmJf
To workaround this issue you can try to add EIGEN_STRONG_INLINE to the functions that are not properly inlined until compiler gets happy.
-- GitLab Migration Automatic Message --
This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/1670.