I wanted to try out Eigen 3.3beta1 and found that my code was running approx. ten times slower than with Eigen 3.2.8 By playing around with the code I found that it will also be slow in Eigen 3.2.8 if const is removed from the line "const Vector3d yi{ tmpyi };" Testcode: (Inlcude eigen/core und chrono) int main(int argc, char** argv) { using namespace Eigen; Matrix3d rot, rott; rot << 0, 0, 1, 1, 0, 0, 0, 1, 0; rott << rot.transpose(); const double Prefactor1 = 1.5456; const double Prefactor2 = 3.5456; Vector3d tmpyi; tmpyi << 2, 1, 9; const Vector3d yi{ tmpyi }; std::chrono::time_point<std::chrono::high_resolution_clock, std::chrono::nanoseconds> det_init_begin, det_init_finished; det_init_begin = std::chrono::high_resolution_clock::now(); Matrix3d tmp; for (int i = 0; i < 1000000; ++i) { tmp = (Prefactor1*(rot*yi.asDiagonal()*rott*-1 * rott + rott*yi.asDiagonal()*rot*rot) + Prefactor2*(yi.asDiagonal()*Matrix3d::Ones()*yi.asDiagonal())); tmp; } tmp; det_init_finished = std::chrono::high_resolution_clock::now(); std::cout << "It took " << (det_init_finished - det_init_begin).count() / 1E6 << " ms to calc 10^6 values " << std::endl << std::endl; system("pause"); return 0; }
Be careful with such benchmark as the compiler is free to move some computation out of the loop. Anyway, I guess the problem is mostly on inline, so please first try with the latest devel branch as we've already enforced the inline of some critical functions for MSVC. Also, you can avoid redundant computations by rewriting your expressions as: Vector3d p1yi = Prefactor1*yi; Matrix3d rr = rot*rot; Matrix3d tmp0 = rot*p1yi.asDiagonal(); Matrix3d tmp1 = rott*p1yi.asDiagonal(); return tmp1*rr - tmp0*rr.transpose() + (Prefactor2*yi*yi.transpose()); Also, I cannot reproduce such a perf. regression with clang or gcc.
Created attachment 695 [details] Test code Did some additional tests and found this easy test case. Tested with following options under MSVS 2013 (Update 2) / 2015 (Update 1 and 2) (Do not have any other compilers installed to test it out) Compiler options: /FR"x64\Release\" /GS /GL /W3 /Gy /Zc:wchar_t /Zi /Gm- /O2 /Ob2 /sdl /Fd"x64\Release\vc140.pdb" /Zc:inline /D "_MBCS" /errorReport:prompt /WX- /Zc:forScope /arch:AVX2 /Gd /Oi /MT /Fa"x64\Release\" /EHsc /nologo /Fo"x64\Release\" /Ot /Fp"x64\Release\Clustertest.pch" Disassembly in the slow case has this additional call: 00007FF727782B74 E8 D7 40 00 00 call Eigen::internal::generic_product_impl<Eigen::Matrix<double,3,1,0,3,1>,Eigen::Transpose<Eigen::Matrix<double,3,1,0,3,1> >,Eigen::DenseShape,Eigen::DenseShape,3>::evalTo<Eigen::Matrix<double,3,3,0,3,3> > (07FF727786C50h) so it is a inline problem. The question now is why is it happening in eigen 3.3 and not in 3.2.8? Speed drops from 12-14ms to 140-160ms for me. (Tested with build: 518152aee731 and 4c94692de3e5 (latest?)) Another call I see is: 00007FF727782D01 E8 4A 34 00 00 call Eigen::DenseBase<Eigen::Matrix<double,3,1,0,3,1> >::operator<< (07FF727786150h) I think this is normal and does not greatly affect speed. And thanks for the yi*yi.transpose() optimization. Helped a bit getting my code faster.
Hmm changing this line (89) in GeneralProduct.h: template<> struct product_type_selector<Small, Small, 1> { enum { ret = LazyCoeffBasedProductMode}; }; to template<> struct product_type_selector<Small, Small, 1> { enum { ret = OuterProduct}; }; makes it a lot faster (about 22ms) but it is still slower than before
So i solved my performance problem by adding EIGEN_STRONG_INLINE to the following functions by just follwing the disassembly calls: Eigen::internal::generic_product_impl<Eigen::Matrix<double,3,1,0,3,1>,Eigen::Transpose<Eigen::Matrix<double,3,1,0,3,1>>,Eigen::DenseShape,Eigen::DenseShape,3>::evalTo<Eigen::Matrix<double,3,3,0,3,3> >() Eigen::internal::Assignment<Eigen::Matrix<double, 3, 3, 0, 3, 3>, Eigen::Product<Eigen::Matrix<double, 3, 1, 0, 3, 1>, Eigen::Transpose<Eigen::Matrix<double, 3, 1, 0, 3, 1> >, 0>, Eigen::internal::assign_op<double>, Eigen::internal::Dense2Dense, double>::run() Eigen::internal::variable_if_dynamic<__int64, 3>::variable_if_dynamic<__int64, 3>() i do not know if it is a good solution but it works
If it solves the issue, I'm fine with adding some additional EIGEN_STRONG_INLINE -- but I don't think that <Small,Small, 1> should be changed to OuterProduct (comment 3).
I'm fine with adding such EIGEN_STRONG_INLINE, could you attach the diff so that we are sure we are adding to right ones. Thanks.
https://bitbucket.org/eigen/eigen/commits/b4d155cd8835/
-- GitLab Migration Automatic Message -- This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/1181.