Summary: | Performance Issue changing from 3.2.8 to 3.3beta1 | ||||||
---|---|---|---|---|---|---|---|
Product: | Eigen | Reporter: | neumann | ||||
Component: | Core - matrix products | Assignee: | Nobody <eigen.nobody> | ||||
Status: | RESOLVED FIXED | ||||||
Severity: | Performance Problem | CC: | chtz, gael.guennebaud | ||||
Priority: | Normal | ||||||
Version: | 3.3 (current stable) | ||||||
Hardware: | x86 - 64-bit | ||||||
OS: | All | ||||||
Whiteboard: | |||||||
Bug Depends on: | |||||||
Bug Blocks: | 558 | ||||||
Attachments: |
|
Description
neumann
2016-03-18 12:37:28 UTC
Be careful with such benchmark as the compiler is free to move some computation out of the loop. Anyway, I guess the problem is mostly on inline, so please first try with the latest devel branch as we've already enforced the inline of some critical functions for MSVC. Also, you can avoid redundant computations by rewriting your expressions as: Vector3d p1yi = Prefactor1*yi; Matrix3d rr = rot*rot; Matrix3d tmp0 = rot*p1yi.asDiagonal(); Matrix3d tmp1 = rott*p1yi.asDiagonal(); return tmp1*rr - tmp0*rr.transpose() + (Prefactor2*yi*yi.transpose()); Also, I cannot reproduce such a perf. regression with clang or gcc. Created attachment 695 [details]
Test code
Did some additional tests and found this easy test case.
Tested with following options under MSVS 2013 (Update 2) / 2015 (Update 1 and 2)
(Do not have any other compilers installed to test it out)
Compiler options:
/FR"x64\Release\" /GS /GL /W3 /Gy /Zc:wchar_t /Zi /Gm- /O2 /Ob2 /sdl /Fd"x64\Release\vc140.pdb" /Zc:inline /D "_MBCS" /errorReport:prompt /WX- /Zc:forScope /arch:AVX2 /Gd /Oi /MT /Fa"x64\Release\" /EHsc /nologo /Fo"x64\Release\" /Ot /Fp"x64\Release\Clustertest.pch"
Disassembly in the slow case has this additional call:
00007FF727782B74 E8 D7 40 00 00 call Eigen::internal::generic_product_impl<Eigen::Matrix<double,3,1,0,3,1>,Eigen::Transpose<Eigen::Matrix<double,3,1,0,3,1> >,Eigen::DenseShape,Eigen::DenseShape,3>::evalTo<Eigen::Matrix<double,3,3,0,3,3> > (07FF727786C50h)
so it is a inline problem. The question now is why is it happening in eigen 3.3 and not in 3.2.8? Speed drops from 12-14ms to 140-160ms for me.
(Tested with build: 518152aee731 and 4c94692de3e5 (latest?))
Another call I see is:
00007FF727782D01 E8 4A 34 00 00 call Eigen::DenseBase<Eigen::Matrix<double,3,1,0,3,1> >::operator<< (07FF727786150h)
I think this is normal and does not greatly affect speed.
And thanks for the yi*yi.transpose() optimization. Helped a bit getting my code faster.
Hmm changing this line (89) in GeneralProduct.h: template<> struct product_type_selector<Small, Small, 1> { enum { ret = LazyCoeffBasedProductMode}; }; to template<> struct product_type_selector<Small, Small, 1> { enum { ret = OuterProduct}; }; makes it a lot faster (about 22ms) but it is still slower than before So i solved my performance problem by adding EIGEN_STRONG_INLINE to the following functions by just follwing the disassembly calls: Eigen::internal::generic_product_impl<Eigen::Matrix<double,3,1,0,3,1>,Eigen::Transpose<Eigen::Matrix<double,3,1,0,3,1>>,Eigen::DenseShape,Eigen::DenseShape,3>::evalTo<Eigen::Matrix<double,3,3,0,3,3> >() Eigen::internal::Assignment<Eigen::Matrix<double, 3, 3, 0, 3, 3>, Eigen::Product<Eigen::Matrix<double, 3, 1, 0, 3, 1>, Eigen::Transpose<Eigen::Matrix<double, 3, 1, 0, 3, 1> >, 0>, Eigen::internal::assign_op<double>, Eigen::internal::Dense2Dense, double>::run() Eigen::internal::variable_if_dynamic<__int64, 3>::variable_if_dynamic<__int64, 3>() i do not know if it is a good solution but it works If it solves the issue, I'm fine with adding some additional EIGEN_STRONG_INLINE -- but I don't think that <Small,Small, 1> should be changed to OuterProduct (comment 3). I'm fine with adding such EIGEN_STRONG_INLINE, could you attach the diff so that we are sure we are adding to right ones. Thanks. -- GitLab Migration Automatic Message -- This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/1181. |