I was having trouble with the matrix exponential function on Windows with Visual C++. It refused to produce optimized code. I messed around and came up with this simple test which seems to show a huge performance difference between gcc and MSVC for fixed-size vectorizable classes. e.g. using Eigen::Matrix4d; int main() { Matrix4d A = Matrix4d::Random(); Matrix4d A_Id = Matrix4d::Identity(); Matrix4d A2 = A * A; Matrix4d A4 = A2 * A2; Matrix4d m_tmp2; const double b[] = {17297280., 8648640., 1995840., 277200., 25200., 1512., 56., 1.}; std::clock_t start = std::clock(); for(int ct=0; ct< 1e7; ++ct) { m_tmp2 = b[7]*A + b[5]*A4 + b[3]*A2 + b[1]*A_Id; } std::clock_t stop = std::clock(); std::cout << "Time taken: " << (stop-start)/(double)CLOCKS_PER_SEC << std::endl; return 0; } Using gcc 4.6.1 (TDM-GCC) as >> g++ -s -O3 -DNDEBUG -Ieigen gives Time taken: 0.03 Using MSVC 2010 64bit as >> cl /O2 /D"NDEBUG -I"eigen" gives Time taken: 0.213 The difference vanishes (actually gcc is a little slower) with dynamic sized matrices.
Be careful with this kind of simple benchmarks where the compiler could too aggressively optimize your code. For instance the compiler could completely remove the for loop or take advantage that the values of the objects are known at compile time. It is better to write a small function with EIGEN_DONT_INLINE: EIGEN_DONT_INLINE void foo(Matrix4d& A, ....) { m_tmp2 = b[7]*A + b[5]*A4 + b[3]*A2 + b[1]*A_Id; } and then call this function multiple times to bench it. However I'm not sure EIGEN_DONT_INLINE does anything with MSVC, so perhaps use a separate .cpp file to implement it and make sure this function won't be inlined.
ok, ICC seems to perform poorly as well. I check the assembler, and the reason is poor inlining. I bet this is the same reason with MSVC. To check the asm, I add an enclosing pair: EIGEN_ASM_COMMENT("mybegin"); ... EIGEN_ASM_COMMENT("myend"); around the critical expression to facilitate the search of the relevant asm lines. You could try to figure out which function are poorly inlined with MSVC, and declare them with EIGEN_STRONG_INLINE, and get back to us.
EDIT: actually ICC does not inline the assignment but the rest is properly inlined, the performance issue is not here for ICC. The reason is an abusive use of the movddup instruction which is called multiple times (4) on the same variable while it should be called only once per b[i]. could you check the assembler produced by MSVC?
Created attachment 216 [details] GCC Assembler output for test case.
Created attachment 217 [details] Portion of MSVC assembler output
(In reply to comment #3) > EDIT: > > actually ICC does not inline the assignment but the rest is properly inlined, > the performance issue is not here for ICC. The reason is an abusive use of the > movddup instruction which is called multiple times (4) on the same variable > while it should be called only once per b[i]. > > could you check the assembler produced by MSVC? Wow, thanks for the fast response. Sorry, I'm not very useful with the assembler code. I spent some time poking around and the best I can come up with is that gcc is creating these nice "assign_LinearTraversal_CompleteUnrolling" functions whereas MSVC is not. However, I'm clueless on why. Poking around in Assign.h everything seems to be EIGEN_STRONG_INLINE.
-- GitLab Migration Automatic Message -- This bug has been migrated to gitlab.com's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.com/libeigen/eigen/issues/357.